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ABSTRACT 


This work investigates issues related to distribution of low-bit-rate video within 
the context of a teleconferencing application deployed over a tactical ATM network. The 
main objective is to develop mechanisms that support transmission of low bit rate video 
streams as a Series of scalable layers that progressively improve quality. The hierarchical 
nature of the layered video stream 1s actively exploited along the transmission path from 
the sender to the recipients to facilitate transmission. 

A new layered coder design tailored to video teleconferencing in the tactical 
environment is proposed. Macroblocks selected due to scene motion are layered via 
subband decomposition using the fast Haar transform. A generalized layering scheme 
groups the subbands to form an arbitrary number of layers. As a layering scheme suitable 
for low-motion video is unsuitable for static slides, the coder adapts the layering scheme 
to the video content. A suboptimal rate control mechanism that reduces the k- 
dimensional rate-distortion problem resulting from the use of multiple quantizers tailored 
to each layer to a 1-dimensional problem by creating a single rate-distortion curve for the 
coder in terms of a suboptimal set of k-dimensional quantizer vectors is investigated. 
Rate control] is thus simplified into a table lookup of a codebook containing the 
suboptimal quantizer vectors. The rate controller is ideal for real-time video and limits 
fluctuations in the bit-stream with no corresponding visible fluctuations in perceptual 
quality. 

A traffic smoother prior to network entry is developed to increase queuing and 
scheduler efficiency. Three levels of smoothing are studied: frame, layer, and cell 
interarrival. Frame level smoothing occurs via rate control at the application. 
Interleaving and cell interarrival smoothing are accomplished using a leaky bucket 
mechanism inserted prior to the adaptation layer or within the adaptation layer. 
Simulations indicate that smoothing lowers bandwidth requirements for a given quality of 
service and that interleaving cells from different layers enhances the effectiveness of 


priority-based scheduling schemes. 


A new cell-scheduling scheme is proposed that exploits the layered video 
hierarchy to allow more graceful degradation in visual quality during periods of cell loss. 
Quality of service at the connection level is maintained using an optimal scheduling 
algorithm that accounts for the cell loss rate and cel] transfer delay requirements for each 
connection. Within the connection, a prioritization scheme denies service to cells from 
lower priority layers during periods of congestion and cells deemed non-viable due to 
group of blocks (GOB) corruption to increase the probability that cells from higher 
priority layers are transmitted. Simulations indicate that protecting higher priority layers 
requires accepting a corresponding decrease in throughput. Depending on the 
prioritization scheme used, cell loss rates for the base video layer can either be 
maintained at the desired rate or improved by an order of magnitude relative to no 
prioritization. Cell discarding allows the scheduler to recover bandwidth from non-viable 
cells although the impact within the connection depends on the service discipline. As the 
GOB size increases, cell discarding 1s improved if cells from different layers are 
interleaved to reflect spatial dependency between the base layer and the enhancement 


layers. 
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i. INTRODUCTION 


Multimedia applications support the processing, transmission, and control of 
streams of related audio-visual signals including text, images, audio, and video data [1]. 
Common examples include streaming applications, such as video-on-demand (VOD), and 
interactive applications, such as video and audio teleconferencing. Multimedia 
applications offer difficult challenges for network design due to the need to bound data 
loss in transmission, the need to limit transmission delays, and the need for synchronizing 
the related streams comprising a multimedia session. In particular, video 
teleconferencing (VTC) demonstrates the great potential of multimedia applications to 
deliver information but, at the same time, poses difficult distribution problems for the 
hosting network. 

VTC plays an important role in the U.S. Navy’s Information Technology for the 
21“ Century initiative (IT-21). IT-21 seeks to transform the current platform centric 
approach to warfighting to a network centric approach that leverages information 
superiority with current and planned smart weapons [2]. At the battlegroup level, 
deploying VTC over a tactical network that links individual units via a wireless link 
offers several benefits including collaborative planning, remote maintenance, distance 
learning, and telemedicine. However, a tactical network thus envisioned present 
constraints not typically present in traditional wireline networks. The tactical network 
may be viewed as an internetwork of shipboard wireline local area networks (LANs) 
interconnected by a wireless channel. The wireless channel serves as a bottleneck within 
the tactical network and constrains both the available bit rate and transmission quality. 


Each of these constraints impacts the perceived quality of any deployed VTC application. 


A. BACKGROUND 


This section provides additional information on the IT-21 initiative and VTC to 
provide a context for the problem scenario in the next section. Additionally, the type of 


service required to support VTC is briefly considered. 


1. YY-21 


A brief examination of the IT-21] initiative 1s valuable for determining the baseline 
network architecture to host a tactical VTC application. The goal of IT-21 is to link all 
U.S. Forces together in a network that enables the transmission of voice, video and data 
from individual workstations seamlessly to both local and remote users [2]{4]. The 
anticipated network is heterogeneous and allows connectivity among wireline LANs 
using both wireless and satellite communication links. All networks and interfaces are to 
use commercial off-the-shelf (COTS) technology built to current industry standards. 

Focusing on the battlegroup level, shipboard LANs are to have ATM backbones. 
Individual workstation connectivity is provided initially via 100 Mbps Fast Ethernet with 
a future transition to direct ATM connections. Connectivity among units of the 
battlegroup is provided by EHF links with a minimum data rate of 128 kbps to support 
messaging and maintain a common tactical picture. However, to support multimedia 
applications, such as VTC or collaborative planning with high resolution, early 


projections indicate that a minimum data rate of 1280 kbps is required. 
2: Video Teleconferencing 


Teleconferencing systems can be broken into three categories: audio-only, audio 
and graphics, and video. VTC 1s an interactive application requiring low network 
latency, bounded delay jitter, and low cell loss to both preserve audiovisual quality and 
maintain the sense of interactivity. In addition, careful synchronization ts required 
between the audio and video streams. While communication may be unicast as in peer- 
to-peer applications, the more challenging problem of multicast communication Is 
considered here. As such, each sender 1s assumed to transmit to multiple receivers in the 
multicast group. In turn, the multicast group consists of some combination of active 
participants that receive and transmit and passive participants that receive only. This 
situation 1s illustrated in Figure [.1. 

Since video, and audio to a lesser degree, is bandwidth intensive, signals are 
compressed prior to transmission and trade some reduction 1n quality for a reduction in 


bandwidth. Multtmedia communications, therefore, require dedicated terminals, which 


to 


capture and prepare signals for transmission over the network and reconstruct received 
streams by decompressing and resynchronizing different streams as required. 
Commercial VTC applications have been facilitated by the emergence of ITU standards 
for multimedia terminals [3]. Each standard targets a bandwidth range (and thus quality), 
a particular networking standard, and incorporates a family of associated standards to 


support the required audio and video compression, control signals, and network interface. 
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Figure I.1: Simple VTC Multicast with Two Active and Two Passive Nodes. 


3; Multimedia Applications and QoS 


Quality of service (QoS) denotes a set of one or more parameters describing the 
level of service granted to an application by a network or required from the network by 
the application for acceptable performance. Many possible QoS parameters exist, but the 
typical parameters employed are maximum allowable delay, delay variation or jitter, and 
cell loss rates. The QoS requirements for a particular multimedia application depend on 
the types of information transmitted and the manner in which the information 1s 
compressed or packaged for transmission. More generally, multimedia applications are 
characterized by the manner in which information is distributed, the degree of 


interactivity, and the type of information transported [1]. 


Multimedia communications are either unicast or multicast. Unicast represents 
peer-to-peer communication while multicast represents m-to-n communication, where m 
ranges from | ton. Unicast examples include client-server applications, such as VOD. 
Multicast examples include distance-learning and tele-remote conferencing. As will be 
discussed later, the manner of communications between the source and recipients may 
complicate information delivery depending on the type of network employed. 

Multimedia applications are either interactive or streaming. Streaming 
applications are either unicast or multicast and are channel asymmetric: significant 
content flows in only one direction. Interactive applications tend to have content 
flowing, in part, in at least two directions although the flow may not be fully symmetric. 
Streaming applications usually do not require strict bounds on delay but are sensitive to 
delay jitter. Interactive applications usually require strict bounds on both or not at all, 
depending on the information content. 

The information flow for multimedia applications is either continuous or 
intermittent. Applications with intermittent flow are not usually delay sensitive but tend 
to tolerate cell loss poorly. Examples include text files, still images, and graphics. For 
applications with continuous flow, such as video and audio, delay sensitivity depends on 
whether the application involved is interactive or streaming as mentioned above. Some 
cell loss is acceptable for continuous flows although the degree depends on the 


information source as well as the amount of compression involved. 
B. PROBLEM SCENARIO 


This section lays out scenario parameters for a tactical shipboard VTC application 
and discusses difficulties with preserving video quality using traditional video coders 


over heterogeneous networks. 
1. Target Scenario 


Using the IT-21 requirements as a baseline, the battlegroup tactical network is 
assumed to be a hybrid wireline/wireless ATM network. Shipboard networks employ an 


ATM backbone and provide complete ATM connectivity to the desktop, offering either 


native ATM services or legacy LAN emulation over ATM. An ATM wireless network 
provides connectivity within the battlegroup. A centralized control station, usually the 
capital ship within the battlegroup, may manage access to the wireless network. This 
network is illustrated in Figure [.2. 

Intrinsically, this arrangement offers asymmetric bandwidth depending on 
whether a connection remains shipboard or is ship-to-ship. Given the current capabilities 
of ATM network interface cards (NICs), workstations can expect a maximum bandwidth 
of 10-25 Mbps with correspondingly higher bandwidths across the backbone. However, 
given current technology, wireless data rates are far more limited. A reasonable 
assumption is a bandwidth of at least | Mbps, a value well within the capability of 
commercially available technologies, such as Multichannel Multipoint Distribution 
Service (MMDS) broadband wireless transmission. MMDS offers line-of-sight (LOS) 
service in the 2.1 GHz to 2.7 GHz band with data rates up to 1.5 Mbps. Satellite links 
complete the connectivity to land-based LANs but are not considered further here since 
their high latency precludes satisfactory performance for interactive multimedia. 

The maximum quality of any multimedia application depends in part on available 
bandwidth (network services also play an equally important role). While the network 
described here provides for high bandwidth aboard individual units, networking between 
units is constrained by the wireless interface. Thus, deploying a tactical VTC application 
at the battlegroup level requires operating within this bandwidth constraint. 

To provide a basis for the work presented here, a set of reasonable requirements 
for low-bit-rate tactical VTC is proposed below using international standards where 
possible to keep within the spirit of IT-21. Given the bandwidth constraints, both the 
audio and video streams must be compressed. Toll quality speech demands far less 
bandwidth than video and can be reasonably limited to 8 kbps or less using code excited 
linear prediction (CELP) speech coding [5]. Video bandwidth requirements depend on 
the desired resolution, frame rate, color depth, and the permissible tradeoff between 
compression gain and perceptual quality. Current low-bit-rate ITU multimedia standards, 


such as H.320 and H.324, use low resolutions and frame rates to enable acceptable video 


quality [3]. Using these standards as a guideline, the tactical VTC transmits video signals 
at 10 fps using the Quarter Common Intermediate Format (QCIF) with a resolution of 
176x144 pixels and targets bit rates in the range of 64-96 kbps. The primary color depth 
supported is 8-bit grayscale although 4:2:0 sub-sampled 24-bit color [6] is a possible 


option. These requirements are summarized in Table [.1. 
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Figure [.2: Hybrid ATM Wireline/Wireless Network. 


VTC Stream Parameter Value 
Video Bandwidth 64-96 kbps 

Resolution 176X144 (QCIF) 

Frame Rate 10 fps 

Color Depth 8-bit gray/4:2:0 24-bit color 
Audio Bandwidth <8 kbps 





Table I.1: Tactica) VIT'C Multimedia Requirements. 


De Video Compression and Robustness 


Given the parameters 1n Table I.1, a video compression gain of approximately 31 
to | is required to transmit 8-bit grayscale, assuming an average available bit-rate of 64 


kbps. Such gains are easily within the capability of current video coding standards, such 


as H.263 and MPEG-1/2. However, traditional video compression schemes are not 
particularly suitable for multicast transmission over packet-based networks. 

Video codecs compress the original video stream by removing the least 
perceptually relevant content and by encoding only the differences between successive 
frames caused by motion. Unfortunately, packet-based networks invariably drop packets 
due to congestion, even in network architectures offering QoS guarantees, such as ATM 
networks. Due to the high compression gains required for transmission, each packet 
contains a significant amount of information. The loss of a single packet corrupts a 
portion of a frame or an entire frame depending on the decoder’s ability to resynchronize 
with the incoming bit stream [7]. With motion compensation, any visual error artifacts 
introduced may persist for many frames past the initial point of corruption (until the next 
[-frame in an MPEG stream and possibly indefinitely in H.263 [8]). The effect of packet 
losses grows more Significant as bit rate decreases. 

The problem of packet losses may be mitigated within the network or at the 
application layer. Within the network, appropriate QoS guarantees can reduce cell losses 
to a level such that any quality degradation due to transmission errors 1s acceptable. 
However, the required cell loss rates can be quite small, on the order of 10°, which 
requires a large allocation of bandwidth to achieve. Two common approaches to 
improving error robustness at the application level are to use codecs without motion 
compensation, such as Motion-JPEG [9], or to vary the bit rate in response to the 
estimated degree of congestion within the network. Motion-JPEG compresses each 
frame individually, thereby greatly improving robustness since visual artifacts are 
confined to the affected frame. However, robustness comes with lower compression 
gains, and, therefore, Motion-JPEG delivers unacceptable quality at low bit rates. If the 
source coder is controllable [11], network feedback reports can be used to modify the 
demand placed on the network by changing the quality of video transmission. While this 
approach provides no inherent improvement in the error resilience of the video stream, 


but it does try to mitigate the effects of congestion on the received video stream. 


However, designing a scheme for controlling the source rate is difficult when 
multicast transmission over a heterogeneous network is considered. A heterogeneous 
network may be defined as one in which end-users are stratified by available bandwidths 
and processing and display capabilities [12]. Using feedback to monitor congestion 
within the network and then making appropriate changes to the outgoing video stream 
becomes problematic as multicast group size increases or as the network topology grows 
more complex. Feedback messages may potentially add to congestion depending on the 
periodicity of transmission. More importantly, since each user represents a different path 
through the network, each connection potentially experiences a different level of 
congestion. The controllable application is faced with a quandary in responding fairly if 
only a small number of members within the multicast group are experiencing congestion. 
Stratification poses a further problem during transmission of real-time video since each 
user has different expectations and tolerances with regard to video quality. Users with 
high bandwidth expect high quality video while users with low bandwidth are generally 
satisfied with less. Meeting the varied expectations with a single video stream is clearly 
impractical and transmitting multiple video streams with gradations in quality requires 


greater bandwidth. 
Cc. DISSERTATION OBJECTIVES 


Given the interest in deploying VTC applications over tactical networks such as 
those envisioned by US Navy’s IT-2] initiative, distributing the video stream while 
maintaining acceptable quality involves reconciling the requirements of multimedia 
applications with the capabilities of tactical networks. As discussed in the previous 
section, video is bandwidth intensive and highly sensitive to transmission errors. A 
tactical network may be characterized as low bit rate, unreliable, and heterogeneous. 
Solving the distribution problem solely in terms of coder design or network design is less 
effective than developing a unified solution that reaches across the application-network 


boundary. 


Accordingly, this dissertation investigates issues related to distributing low-bit- 
rate video within the context of a teleconferencing application deployed over a tactical 
ATM network. The main objective is to develop mechanisms that support transmission 
of low-bit-rate video streams as a Series of scalable layers that progressively improve 
quality. These mechanisms exploit the hierarchical nature of the layered video stream 
along the transmission path from the sender to the recipients to facilitate transmission. 
Specifically, the approach proposed in this dissertation works across the application- 
network interface by coding the video stream into layers, shaping the resulting layered 
video stream prior to entry into the network, and prioritizing service in accordance with 
the relative perceptual importance of each layer. The resulting distribution path is 


illustrated in Figure [.3. 
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Each of these mechanisms centers on dividing the video stream into an 
independently decodable base layer that guarantees a minimum, acceptable level of 
quality and several enhancement layers that increase quality in a hierarchical manner. 
Transmitting video in layers has several inherent benefits. The layered structure provides 
a means for implementing open-loop congestion control by allowing recipients to drop 
layers exhibiting high packet loss rates, thereby reducing network loading [12]. Earlier 
work by Rhee and Gibson [13] indicates that layered video exhibits improved resilience 
to bit errors introduced during transmission since spreading bit errors across multiple 
layers has less impact on the reconstructed video. 

Here, a new layered coder design tailored to video teleconferencing in the tactical 
environment 1s proposed. Specifically, the coder is optimized for VTC video scenes 
consisting of low motion video, such as a “talking head,” and static scenes corresponding 
to presentation slides. The concession to the tactical environment is an emphasis on low- 
bit-rate coding, low-complexity coding for low delay and power requirements, and 
inherent robustness to minimize the effect of packet losses and bit errors. Two major 
problems are considered. The first is the notion of how to effectively map frequency 
content to the requisite number of layers and thus creating the required perceptual 
hierarchy. A generalized layering scheme presented uses the fast Haar transform to 
segregate frequency content into subbands; these subbands are then grouped by 
perceptual relevance to form the required number of layers. However, a layering scheme 
suitable for low-motion video is unsuitable for static slides. Static slides place a much 
greater emphasis on high-frequency content, and an appropriate layering scheme 1s 
included with the coder design. The coder adapts to the current video type by shifting to 
the correct layering structure. 

The second problem is developing a rate control scheme for the layered video 
coder. Rate control is a requisite for maintaining a desired QoS level in an ATM 
network, but the use of multiple quantizers complicates developing an optimal rate 
controller appropriate for a real-time application. A suboptimal rate control mechanism 


that reduces the k-dimensional rate-distortion problem resulting from the use of multiple 


quantizers to a 1-dimensional problem by creating a single rate-distortion curve for the 
coder in terms of a suboptimal set of k-dimensional quantizer vectors provides a more 
appropriate alternative. Rate control can thus be simplified into a table lookup of a 
codebook containing the suboptimal quantizer vectors. 

The manner in which the compressed bit stream 1s transmitted to the network has 
a profound effect on queuing efficiency and therefore the bandwidth required to meet the 
required QoS. Smoothing the video traffic reduces variation and uncertainty in the 
arrival process and improves queuing efficiency. Here, a traffic shaper is employed to 
deterministically smooth the entire stream, all layers included, to maximize queuing 
efficiency. The only drawback to smoothing is the insertion of additional delay in the 
transmission path due to the need to buffer an entire encoded frame prior to transmission. 
However, a new scheme 1s proposed that partially offsets the delay. The traffic shaper is 
also responsible for interleaving cells from each layer for transmission within the 
outgoing stream. Order of arrival into the queue appears to affect scheduling 
performance in priority-based scheduling systems [16]. 

Layered video traffic offers another dimension to the scheduling problem as well 
as an avenue for reducing the impact of network congestion on the overall quality of the 
reconstructed video. Since video is transmitted as a base layer and a series of 
enhancement layers, a hierarchical priority system is appropriate. During periods of no 
congestion, the layered video connection 1s serviced at its required QoS without regard to 
the layering structure. During network congestion, emphasis 1s placed on servicing the 
most perceptually important layers, starting with the base layer, and denying Service to 
the least important layers. Conceptually, the overall connection is granted a certain 
bandwidth. As cell loss increases due to congestion, the bandwidth 1s reallocated to 
Support only the most important layers. 

However, the impact of an individual cell loss may not be viewed in isolation. 
Another factor to consider is the temporal dependence between adjacent cells in 
ultimately reconstructing the video sequence. Both cell losses and bit errors in 


transmission create gaps in the incoming bitstream causing the decoder to lose the 


synchronization required to recognize codewords within the stream. The decoder then 
must parse forward within the bit stream until a marker 1s found to re-enable 
synchronization. Therefore, if a cell 1s dropped from the queue, all cells up to but not 
including the cell containing the next marker are not useable and will not be decoded. 
This situation can be exploited by reacting to cell loss by searching for related cells 


rendered unusable and discarding them to open scheduling opportunities for other cells. 
D. DISSERTATION ORGANIZATION 


The dissertation is organized as follows. We start with a discussion of general 
multimedia network architectures and traditional video codec designs. Next, the 
elements for improving network distribution of low-bit-rate video are presented. These 
elements include design of a suitable low-complexity layered video coder for tactical 
environments, a traffic-shaping scheme to maximize queuing and scheduling efficiency, 
and network scheduling algorithms that provide QoS support for layered video while 
maximizing perceptual quality during periods of congestion. 

Chapter II begins with an overview of transmission of multimedia traffic in both 
the IP and the ATM environments. ATM and a brief discussion of related ITU standards 
for multimedia terminals are covered. Since layered video follows a strict hierarchy in 
regard to perceptual importance, identifying layers within the network is crucial to 
implementing priority-based scheduling. Also, as dropped cells may corrupt future 
portions of the video stream, either within a layer or across all layers, identifying logical 
resynchronization points with the stream allows the scheduler to make intelligent 
decisions on when to discard cells. Accomplishing each of these tasks is dependent on 
the manner in which the layered video stream 1s transmitted within an ATM network. 
Therefore, two approaches are examined: multiplexing all layers over a single virtual 
channel or assigning individual layers to separate virtual channels. 

Chapter III provides an overview of hybrid video coding along with a brief 
introduction to the three components of video coding: transforms, quantization, and 


entropy encoding. The notion of wavelet-based image compression is presented as a 
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motivation for layered video transmission. Chapter IV examines the problem of layered 
coding for both low-activity motion video and static presentation slides. A heuristic 
approach to designing layering schemes for motion video is presented and a particular 
scheme for low-bit-rate video is proposed. As a layering structure for motion video is 
unsuited for static presentation slides, another layering structure 1s proposed emphasizing 
the greater perceptual importance of the high frequency content. The problem of rate 
control for layered coding is examined and a simple open-loop controller 1s proposed. 

Chapter V discusses the concept of traffic smoothing for increasing queuing 
efficiency and scheduler performance. An integrated smoothing scheme is proposed that 
smoothes traffic at three time scales: interframe, intraframe, and across the layer 
hierarchy. Implementation within the context of an ATM network is also considered. 
Chapter VI addresses the issue of scheduling layered video traffic. Several algorithms 
are proposed to maximize throughput while exploiting the opportunities provided by 
layered video to reallocate bandwidth within a connection as required to preserve the 
higher priority layers. A cell-discard policy is also discussed that represents the 
interdependence of cells in the traffic flow, both within a layer and across layers. 
Simulation results illustrating the different algorithms are presented and discussed. 

Chapter VII summarizes the significant contributions made in the dissertation and 
provides concluding remarks along with a discussion of possible topics for future 
research in layered video transmission and related areas. 

Appendix A presents the OPNET process models used to validate the behavior of 
the layered scheduling algorithms presented in Chapter VI. Appendix B presents a 
suitable video traffic model used to simulate the behavior of a rate-controlled video 


traffic stream. 


II. NETWORK ARCHITECTURES FOR MULTIMEDIA TRAFFIC 


Before introducing the topics of video compression and scheduling, we examine 
integrated services network architectures appropriate for video teleconferencing. We 
start by considering the characteristics of a generic m ton VTC application. VTC 
applications are inherently real-time interactive, transmit continuous media as well as 
discrete, and operate in multicast mode. The interactive and continuous nature of the 
application suggests that strict bounds are required on both delay and delay jitter. Since 
both video and audio traffic are generally compressed, packet losses must be limited to 
avoid excessive reconstruction errors. Summarizing, the characteristics of VTC 
applications imply the following requirements: multicast support, QoS guarantees, and 
real-time support. Based on these requirements, two network architectures provide a 
suitable basis for VTC [3]: IP-based networking in conjunction with RTP and ATM 
networking. 

The purpose of this chapter is to refine the networking scenario underlying the 
VTC application and provide a context for the work presented in this dissertation. While 
multicast IP is briefly considered, a wireless ATM network appears more suitable for 
tactical VTC applications and is covered in far greater detail. Emphasis is placed on 
describing ATM’s support for different traffic types, QoS support, and connection setup 
using a simple layered protocol model to indicate where each level of functionality is 
implemented. The ATM cell format is examined, and an overview of ATM multicast 
implementations is presented. Two other related topics are covered in some detail: a 
brief introduction to wireless networking, focusing on the data link control and physical 
layers, and coverage of ITU multimedia terminal standards that pertain to ATM 
networks. 

The final issue considered is support of layered video traffic within the context of 
established ATM networking standards. The first problem is how to map individual 


video layers onto ATM connections. All of the layers may be interleaved over a single 


logical connection or transmitted separately using individual connections. The 
implications of both approaches are considered and presented along with the attending 
advantages and liabilities. The second problem is facilitating layer identification within 
the network to implement an appropriate scheduling algorithm. In some cases, it is also 
valuable to identify other elements within each layer, such as the positioning of frame 
and group of blocks (GOB) headers. Identification is complicated by the deliberate 
simplicity of the ATM cell header since the user has limited means for altering fields 
within the header. Two cell tagging schemes are presented to accommodate this, one for 


the single connection case and the other for the multiple connection case. 
A. LEGACY IP-BASED NETWORK 


Although the TCP/IP protocol suite is the dominant commercial architecture for 
internetworking, TCP/IP 1s not practical for real-time, multimedia applications. Still IP- 
based networks are so prevalent that incentives exist for working within the limitations 
imposed by IP to add some support for real-time traffic. The current approach is to use 
RTP over UDP/IP to provide real-time support for a video application as illustrated by 
the protocol stack illustrated in Figure 1.1. The following paragraphs consider both 
TCP/IP networking and RTP over IP; the latter is termed the legacy approach to real-time 
networking. TCP/IP is considered primarily to show how the design decisions, while 
appropriate for the type of traffic originally envisioned, preclude real-time support. 
Discussion of the lower layers is deferred until later when wireless networking is 


considered. 
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Figure II.1: IP-Based Network Protocol Stack for Real-time Traffic. 


1. IP and Multicast IP 


In regard to real-time traffic, IP is effectively neutral. IP provides a connectionless 
service to higher layers, providing only “best effort” delivery of datagrams [19]. Best- 
effort service does not guarantee that any data transmitted will ultimately be delivered or 
arrive in any particular order. Connectionless service was chosen for IP since datagrams 
traveling through different networks might encounter a variety of protocols. By offering 
only an unreliable service, IP requires very few services from the constituent networks 
traversed by datagrams. Any additional end-to-end services, such as a reliable, 
connection-oriented service, are added by transport layer protocols, such as TCP, if 
needed. However, best-effort service precludes any notion of QoS by definition. 
Although higher layers may add additional functionality to control information loss, other 
QoS parameters, such as delay and delay jitter, cannot be guaranteed. Even worse, if any 
part of a network transmission path includes an IP network, no explicit QoS guarantees 
are possible regardless of the capabilities of the other networks in the path. 

IPv4 has been extended through various efforts to provide multicast functionality 
though support must be regarded as experimental since currently most IP routers do not 
explicitly provide multicast service. The best example of multicast IP is MBone 
(multicast backbone), an outgrowth of early multicast experiments during the formulation 
of the IP multicast protoco] [20]. Mbone consists of a virtual network of multicast 
routers or mrouters. Multicast packets are transmitted point-to-point between mrouters, 


using tunneling as necessary to traverse ordinary routers [21]. Several audio and video 


tools have been written to take advantage of Mbone, but they are restricted primarily to 
the Unix platform [1]. The next generation of IP, IPv6, explicitly supports multicast 


functionality [18]. 
2 Transport Layer Protocols 


TCP is a transport protocol that provides the reliable, connection-oriented service 
lacking in IP and guarantees sequential delivery of data to the application layer. 
However, this very service precludes the use of TCP for real-time multicast applications 
[18]. TCP is a point-to-point protocol; TCP connections are established between two end 
users. Reliability and sequencing are provided through a system of acknowledgments 
and retransmissions [19]. However, real-time applications have stringent delay 
requirements and retransmitted segments usually cannot arrive in time to provide a 
benefit. In this case, retransmissions merely waste bandwidth. TCP also includes a 
window-based flow control scheme to prevent faster systems from overwhelming slower 
systems with data and to implement congestion control schemes. However, the same 
scheme impedes delivery of streaming data. 

For these reasons, UDP is favored for real-time traffic, offering simple transport 
layer access to IP with low overhead. While UDP provides no explicit support for real- 


time applications, real-time traffic 1s not impeded as in the TCP case [18]. 
RP Real-time Transport Protocol 


RTP is a lightweight transport protocol for real-time applications and employs 
UDP for access to both IP and multicast IP. RTP does not provide either reliable service 
or QoS guarantees since the underlying IP layer precludes these services. RTP does 
provide a framework of services to the application that allows the application to monitor 
and compensate for the actual QoS the network is delivering to the recipients. RTP 
follows the concept of application-level framing [11] as posed by the following scenario. 
The sending application transmits data continuously to one or more receiving 
applications. Each receiving application is able to accept less than perfect delivery and 


still continue operating, thus negating the need for retransmissions. For example, a video 


decoder parses past missing data and resynchronizes as required to restart decoding. 
However, each receiver does monitor the QoS provided by the network, in terms of 
delay, delay jitter, and packet loss, and relays the information back to the sender. Taken 
collectively, the feedback reports indicate network conditions and provide an opportunity 
for the sender to adapt in hopes of obtaining better QoS. If receivers report high packet 
losses, indicating possible network congestion, the sender might move to a lower-quality 
transmission to place a smaller demand on the network. To benefit from RTP, the 
application must be controllable, that is, able to adjust bandwidth requirements 
dynamically as dictated by network conditions. A video coder, for example, could reduce 
frame rate, resolution, or perceptual quality [9]. 

RFC 1889 specifies both a data transfer protocol, simply termed RTP, and a RTP 
control protocol, RTCP [10]. RTP supports either unicast or multicast transmission by 
organizing participating RTP entities into a session. Each entity transmits data to the 
session through a single UDP port using an application-level packet format defined by 
the protocol. RTP packet headers identify the payload type: the media type (audio or 
video) and the format (G.728 audio or H.261 video) {22}. The header also provides a 
source identifier to indicate the multicast group generating the data, a sequence number 
for loss detection, and a timestamp for recording the time the first byte of data was 
generated. The timestamp allows synchronization among different streams. 

RTCP provides for feedback reports to sending applications as well as reports to 
all members of the multicast session [10][18]. Reports are transmitted through a separate 
UDP port from RTP packets. Receiver reports provide feedback on observed QoS to the 
sending entity. Sender reports are used to alert participants when multiple source 
identifiers are related, such as synchronized audio and video streams, and should be 
received together. Each session member also periodically sends status reports that 
collectively allow other members to estimate the size of the session. Session size 1s used 
to scale the report transmission rate to avoid overburdening the network. 

An important point is that RTP does not provide a mechanism or algorithm for 


determining the manner in which the sender interprets feedback reports and adjusts 


network demands. Instead the application must be written to take advantage of RTP, 
which suggests that RTP should be viewed as more of an application framework than a 


complete networking protocol [18]. 
4. Suitability of RTP/IP for VTC 


The introduction to this section indicated three features required for a networking 
architecture to fully support video conferencing. The legacy RTP/IP network architecture 
provides adequate real-time and multicast support, yet the architecture falls short in two 
areas. First, applications use RTCP receiver reports to mitigate the effects of congestion. 
With large or heterogeneous networks, the reports may vary significantly since each 
receiver experiences different network conditions. This greatly complicates the control 
issue although it 1s correctable to some extent with RLM [45]. Second, and more 
significant, the lack of QoS guarantees may lead to unsatisfactory reconstruction of the 
audio and video streams. IP routers do not guarantee QoS since IP routing does not 
incorporate the concept of resource reservation and only provides service through 
variants of first-come, first-serve (FCFS) scheduling. The new Resource reSerVation 
Protocol (RSVP) has been developed to provide support for QoS under the proposed 
Integrated Services Architecture (ISA) [18]. Each router running RSVP must implement 
an admission control scheme, a scheduling scheme, and be able to classify packets 
according to QoS requirements. At this point, RSVP is not widely implemented and its 


capabilities are already duplicated by the more mature ATM network architecture. 
Be ATM NETWORKS 


ATM grew out of the desire to utilize the high bandwidth available from optical 
fiber to create a Broadband Integrated Services Digital Network (B-ISDN) that is able to 
support audio, video, and data services within the same network [27]. In contrast to 
TCP/IP, where the end-user transport layers provide only reliable service and network 
delivery is best effort, ATM networks provide QoS guarantees. ATM guarantees QoS by 
comparing the caller’s QoS requirements to available network resources and then 


allowing a connection if sufficient resources exist [18]. Resources are reserved for the 


duration of the connection. ATM distinguishes among several different service or traffic 
classes, such as the real-time and non-real-time traffic at constant and variable bit rates, 
and provides support through a combination of QoS primitives and transport layer 
adaptation. 

ATM represents a medium between PSTN circuit-switched networks and 
connectionless packet-switched networks. ATM uses virtual circuits to simplify 
switching decisions but allows several connections to be multiplexed over a single 
physical interface to promote efficient bandwidth utilization. Virtual circuits imply 
connection-oriented service, but ATM also provides the equivalent of connectionless 
service to support the widest range of applications possible. 

ATM was designed to support high bit rate connections, such as OC-3 (155 
Mbps) and OC-12 (622 Mbps) over fiber [23][24]. The decision to employ fiber, a 
physical medium with extremely small bit-error-rates (BER), allows ATM to minimize 
both error and flow control functionality. Minimizing these capabilities reduces overhead 
in processing ATM cells and decreases the header bits required per cell, thus allowing 
fast switching speeds and efficient data transport. High speed switching is further 


supported by use of small, fixed-length cells. 
Ik ATM Protocol Model 


The ATM protocol model is shown in Figure II.2 [23]. The protocol model 
consists of three separate planes: management, control, and user. The management plane 
provides management functions and exchanges information between the control and user 
planes. The control] plane deals with call establishment, connection control, and call 
release. To provide these functions, the control plane has access to the network and 
separate signaling protocols and cell definitions. The user plane supports transfer of user 
information by providing such functionality as flow and error control, timestamps for 
synchronization, and sequencing. 

The user plane includes the ATM Adaptation Layer (AAL), the ATM layer, and 
the physical layer. The AAL is a service dependent layer and adapts information streams 


from higher layers for transmission over ATM. Example streams include compressed 
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video, constant bit rate (CBR) audio, or even IP datagrams. Each has distinct service 
requirements. The AAL maps data and service requirements from these streams to 
services provided by the ATM layer. The ATM layer provides data transport using cells 
over an end-to-end logical connection and controls access to the underlying physical 
layer. 

The physical layer is medium dependent. The physical layer includes two 
sublayers: physical-media dependent (PMD) sublayer and transmission-convergence 
(TC) independent sublayer [23]. The former deals with aspects that are dependent on the 
transmission medium selected (e.g., bit timing and line coding). The latter handles issues 
that are independent of the transmission medium characteristics, such as error control or 
determination of cell boundaries in the physical layer payload. ATM specifies SONET, a 
fiber standard that provides synchronous time-multiplied transmission at high bit-rates, as 
the basic physical layer interface. Other physical layer interfaces, such as UTP [25][26], 


are specified to promote interoperability. 
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Figure IJ.2: ATM Protocol Architecture [23]. 


2 Logical Connections 


End-to-end connections in ATM are defined in terms of a virtual channel 
connection (VCC) and a virtual path connection (VPC). Figure JI.3 illustrates the role of 
VCCs and VPCs within an ATM network. VCCs are created dynamically between two 
end users to provide a unidirectional channel for ATM cells carrying user data and are 
terminated at call release. Cells are carried in sequence. VCCs are also set up between 
an end user and the network to carry control signals and between network nodes to 
facilitate network management and routing. These connections cross the user-network 


interface (UNI) and the network-network interface (NNJ), respectively. 


<«— User—> 





wm User -—- + Network 


i, Controller Network Node: ATM Switch 


Source 


Destination 





Multiplexing Buffer: 
Admission Control is 
Imposed Here (the AP) 


Source 





Figure II.3: ATM Network Configuration [27]. 


ATM networks include a higher level of connectivity in the form of virtual paths. 
The virtual path concept 1s motivated by the trend toward increasing bandwidth, which 
also increases the possible number of connections a channel may carry. Compared to IP 
networks, ATM’s circuit-oriented structure and QoS guarantees incur greater control 
costs. Since these control costs scale with the number of connections, virtual paths 
decrease cost by reducing the number of connections managed by the network. A VPC 


represents a network-defined, end-to-end connection representing a set route through the 


network and providing a specified QoS such as bandwidth. Each VPC carries multiple 
VCCs with these same end-points, and all associated cells are switched along the same 
path. Since most of the work required to establish a connection (reserving capacity and 
calculating routes) is performed when a VPC 1s established, call setup time for new 


VCCs is greatly reduced. 
5 ATM Cell Format 


ATM employs fixed-size cells consisting of a 5-octet header and a 48-octet 
information field. The cell header format differs depending on whether the cell is 
entering the network (UNI) or moving within the network (NNI). Figure II.4 shows the 
ATM cell format at the UNI. NNI ATM cells do not retain the generic flow control 
(GFC) field; instead they use the bits to expand the virtual path identifier (VPI) from 8 to 
12 bits. 
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Virtual Channel Identifier 
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Figure I1.4: ATM Cell Format at the UNI [23]. 


The GFC field is used to control cell flow at the UNI although application 
remains an area of active study [18]. The GFC 1s not carried end-to-end and is 
overwritten by ATM switches to expand the VPI. 

The VPI identifies a routing path within the network. The field width 1s 8 bits at 
the UNI and 12 bits within the NNI, thereby allowing a greater number of virtual paths 
within the network. The virtual channel identifier (VCI) identifies an end-to-end routing 


path and functions similar to the ports in TCP or UDP. 
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The payload type (PT) is a 3-bit field used to indicate the type of data in the 
information field. A high order bit of O indicates a user data cell; 1 indicates either a 
resource management (RM) cell or a cell carrying maintenance information. The second 
bit is initially cleared at the UNI. Within the NNI, a switch sets the second bit whenever 
congestion is experienced. Switches downstream can monitor this bit to guage network 
conditions. The third bit 1s the service data unit (SDU) type bit and allows the user to 
designate two types of SDUs. One use of the SDU bit is to implement different service 
strategies for ATM cells based on their content. 

The cell loss priority (CLP) field 1s set by the user to indicate the relative priority 
of cells in case congestion forces a switch to discard cells. A value of 0 indicates higher 
priority, and the cell should be dropped only as a last resort; | indicates a lower priority 
cell that a switch may drop to ease congestion. As part of call setup, the user negotiates a 
contract with the network and agrees to transmit data in accordance with various traffic 
parameters. The user may negotiate separate contracts for CLP = 0 and CLP = | traffic. 
Network switches also set the CLP bit for any data cell in violation of its traffic contract 
even if the switch has sufficient capacity to transmit the cell. Subsequent switches may 
then discard the cell as required. 

ATM cells include an 8-bit header error control (HEC) field calculated based on 
the first four octets of the header. The HEC allows detection of errors and correction of 
single-bit errors. If a multi-bit error is detected, the cell is discarded. No error detection 


is provided for the information field. 
4. ATM Service Classes 


ATM is designed to support a wide range of applications: from interactive 
applications, such as video and multimedia conferencing, to distribution services, such as 
archive retrieval and document browsing [27]. Recall that each application transmits a 
sequence of cells through a virtual channel connection. Providing the desired QoS to a 
new VCC depends on the new connection’s traffic flow characteristics as well as the 
characteristics of existing VCCs. Traffic handling, from call acceptance to network 


scheduling, is therefore simplified by defining discrete service categories. The ATM 
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Forum has defined five ATM layer service classes as shown in Table II.1 [28]. Each 


VCC established receives service in accordance with one of these categories. 


Interactivity Service Class 
Real-time service Constant bit rate (CBR) 

Real-time variable bit rate (rt-VBR) 
Non-real-time service Non-real-time variable bit rate (nrt- VBR) 


Available bit rate (ABR) 
Unspecified bit rate (UBR) 


Table If.1: ATM Service Classes [28]. 


Real-time services are characterized by low tolerance for delay and delay jitter. 
Applications that involve human interactivity, such as video conferencing, are real-time 
since excessive delay degrades the perception of true interactivity and jitter impedes the 
smooth playback of audio and video. The two services defined for real-time service, 
CBR and rt-VBR, are distinguished by variation in data rate. CBR, as expected, 
transmits data at a fixed rate and is the easiest service to support. Applications include 
both compressed and uncompressed data. Toll-quality PCM speech requires a constant 
data rate of 64 kbps. H.261 was designed to support transmission over one or more ISDN 
B channels and compresses video at a multiple of 64 kbps. CBR is commonly employed 
for uncompressed applications, such as broadcast quality video conferencing and 
interactive audio. Rt-VBR applications have data rates that are “bursty” and time- 
varying and are characterized by a mean bit rate and a peak bit rate. Compressed video is 
inherently VBR since compression gain naturally varies with each frame depending on 
scene content (see Chapter III). Rt-VBR 1s more difficult for networks to support but 
provides greater flexibility than CBR. VBR streams may be Statistically multiplexed 
over the same channel for more efficient use of bandwidth. 

Non-real-time services are intended for bursty traffic without stringent 
requirements on delay and jitter, thus giving a network more flexibility in dealing with 


these traffic flows. Nrt-VBR applications generate VBR data that does not require strict 


limits on delay but does require some upper bound. Examples include banking and 
airline transactions [18]. UBR service is best effort service similar to that provided by 
IP-based networks. UBR connections receive no dedicated resources: bandwidth is 
provided dynamically from spare channel capacity not utilized by CBR and VBR traffic. 
ABR improves upon UBR’s best effort service. ABR applications specify both a 
minimum cell rate (MCR) and a peak cell rate (PCR). At any time, the network ensures a 
fair allocation of resources among all ABR connections such that each connection 
receives at least their MCR, and possibly up to the PCR, depending on available capacity. 
TCP connections and LAN traffic commonly employ ABR service. Figure II.5 shows 


how channel capacity could be allocated to each service category. 
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Figure II.5: Bandwidth Allocation for ATM Service Categories [18]. 


At call setup, a user requests service by supplying the network with traffic 
descriptors that characterize the cell flow and the required QoS. The exact parameters 
provided are service dependent. Traffic descriptors allow the network to determine if 
sufficient resources are available to support the connection’s QoS requirements. For 
example, a user requesting rt- VBR service must supply the PCR, the sustainable cell rate 
(SCR), and the maximum burst size of cells (MBS). A CBR connection provides only 
the PCR. The QoS desired is specified in terms of cell delay variation (CDV), maximum 
cell transfer delay (maxCTD), and cell loss ratio (CLR). Real-time services require all 
three QoS parameters be specified. Non-real-time services do not specify any QoS 


parameters except for nrt-VBR, which specifies CLR. 


A connection 1s accepted only if network can reserve sufficient resources while 
maintaining the QoS of existing connections. Assuming the connection 1s accepted, the 
traffic descriptors and QoS parameters form a traffic contract between the user and 
network. The user agrees to transmit in accordance with the traffic parameters. In turn, 
the network guarantees the QoS parameters for the duration of the connection. Once, the 
connection is active, the network performs traffic policing to ensure compliance. If the 
user violates the traffic contract, perhaps by exceeding the SCR, offending cells may be 


tagged using the CLP bit or discarded. 
2: ATM Adaptation Layer (AAL) 


Referring back to Figure IJ.2, the AAL provides services to applications or other 
transfer protocols not found in the ATM layer. To minimize the number of AAL 
protocols required, ITU-T Recommendation I.121 defines four generic service classes’, 
A-D, based on three application service requirements [29]: bit rate (constant or variable), 
the timing relationship between the source and receiver (required or not), and the 
connection mode (connectionless or connection-oriented). These service classes are 
more general than the previously described ATM layer service classes and do not include 
either formal traffic descriptors or QoS parameters. In addition to these application 
service requirements, ITU-T Recommendation 1.362° provides example services that the 
AAL may provide to enhance the ATM layer including [30]: handling transmission 
errors, segmentation and reassembly to map user data to the 48-octet information field in 
ATM cells, handling lost and misinserted cells, and flow and timing control. 

To distinguish between data handling and service dependent functionality, the 
AAL is divided into two sublayers. The convergence sublayer (CS) provides service- 
dependent functions and a service access point (SAP) for applications. Functionality 
within the CS is further differentiated into the service specific CS (SSCS) and the 


common part CS (CPCS). Discussion here focuses on the CS as a composite entity. The 


' Two other classes, X and Y, considered for a raw cell delivery service have been dropped. 


* 1.362 has been superceded by the ITU-T F.600 and F.700 Series recommendations. 


SSCS and CPCS are individually addressed only when required. The segmentation and 
reassembly (SAR) sublayer segments user data to fit within the 48-octet length of the 
ATM cell information field and reassembles user data correctly at the destination. 
Segmentation is shown in Figure II.6. The higher layer delivers a protocol data 
unit to the CS sublayer. The CS sublayer adds either a header or a trailer or both and 
pads the CS-PDU as required. The SAR breaks up the CS-PDU, optionally adds a header 
and/or a trailer to each segment such that the resulting SAR-PDU is 48 octets in length. 
The SAR-PDU then fits within a single ATM cell for transmission. At the receiver, each 


of these steps is simply reversed. 
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Figure II.6: Segmentation at the AAL [18]. 


The ITU-T originally proposed five AAL protocols [31], Types | to 5, but later 


combined Types 3 and 4. The relationship between the generic service classes proposed 
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by I.161 and the AAL protocols is shown in Table II.2; the protocols do not necessarily 


map to individual service classes. 
















Class B 
| Timing Relation 


Required Required Not Required 


Bit Rate 
Connection Mode Connectionless 
AAL Protocol Type 3/4 






Type 5 


Table II.2: AAL Protocol Mapping to Service Classes [18]. 


The AAL protocols in Table I[.2 map in an interesting manner to the ATM layer 
service Classes shown in Table II.1. The most widely used protocols are AAL1 and 
AALS. AALI is for connection oriented CBR traffic, matching the ATM layer CBR 
service. AALS ts also connection oriented but supports VBR traffic. AALS assumes 
higher layers perform connection management and that the ATM layer produces minimal 
errors. As a result, AALS has low processing and transmission overhead and adapts well 
to existing transport protocols, such as TCP. These features make AALS the most 
versatile AAL protocol, and AALS 1s used with all of the non-real-time ATM layer 
Services. 

The remaining ATM layer service 1s rt-VBR. AALI 1s not appropriate for rt- 
VBR. For reasons stated above, AALS is the simplest protocol for transmitting video. 
AAL3/4 provides better support for streaming data with low delay. However, AAL3/4 
integrates poorly with most processor architectures [32], 1s more complicated than AALS, 
and demands more processing and increased overhead. For this reason, AAL3/4 seems 
relegated to specialized applications and has been replaced by AAL5. AAL2 appears the 
most appropriate choice, but delays in developing the specification have slowed its 
employment. Choosing the correct protocol depends on the specific application and a 


reasonable expectation of vendor support. A more complete description of each protocol 
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is available in [18] except for AAL2, which is covered by ITU-T Recommendation 
1363-28533 |. 


6. ATM Multicast 


Based on end-to-end connectivity, multimedia applications fall into three 
categories: point-to-point, point-to-multipoint, and multipoint-to-multipoint. Multimedia 
applications such as videophone or Internet telephony fall into the point-to-point 
category. Video on demand or remote broadcasting falls into the point-to-multipoint 
category. Finally, video conferencing falls into the multipoint-to-multipoint. The latter 
categories present a great challenge due to the need to efficiently switch video streams to 
avoid network loading and the additional delay added by cell duplication or readdressing 
[34]. The approach taken in ATM 1s somewhat different from multicast IP due to ATM’s 
virtual circuit structure. 

The ATM UNI 3.1 standard [23] specifies both point-to-point connections and 
point-to-multipoint connections. The motivation behind a point-to-multipoint connection 
is to conserve bandwidth by minimizing the number of VCIs required within the NNI. 
For example, if an end-user wishes to transmit to N other users, separate point-to-point 
connections would require N separate VCIs, each with the same bandwidth requirements. 
A point-to-multipoint connection allows VCIs to be consolidated within the NNI when 
they have common end-points. A point-to-multipoint VCC has the following properties. 
First, the multicast group resembles a tree with the sender as the root node and the 
receivers as leaf nodes. Second, the connection between the root and the leaves is 
defined by a single VPI/VCI at the UNI. Cells transmitted by the root are received by all 
of the leaves, assuming no losses in transmission. No bandwidth is allocated for 
transmission from the leaves to the root; the connection is one-way. A one-way 
connection is required since the root node has no mechanism for filtering data from each 
leaf over a single VCI°. Under UNI 3.1, a point-to-multipoint connection is set up as a 


point-to-point connection between the sender and the first leaf node. The root node then 


* This is possible using AAL3/4 but does not appear to be practical. 
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adds additional leaves until the multicast group is complete. Leaf nodes may be dropped, 
either by their own request or by the root node, but leaves may not add themselves to the 


circuit. A point-to-multipoint multicast scenario is shown in Figure II.7. 
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Figure [J.7: ATM Point-to-multipoint Multicast. 


UNI 3.1 does not provide a specification for a multipoint-to-multipoint 
connection. A multipoint-to-multipoint VCC has properties similar to the point-to- 
multipoint with an important difference. The connection 1s defined by a single VPI/VCI 
at the UNI. All cells transmitted by one endpoint of the connection are delivered to all 
other endpoints and the endpoint is capable of receiving cells over the same VCC from 
any of the other connected endpoints. This duplex transmission leads to several 
difficulties [35]. First, data cells from different sources arrive at the endpoint interleaved 
and must be properly reassembled by the AAL. AAL] and AALS do not provide this 
capability [31]. AAL3/4 has a multiplexing identifier (MID) field that allows 


multiplexing within a VCI, but there is no standard for assigning MID values. The small 


size of the MID field restricts multicast group size, and AAL3/4 requires a great deal of 
overhead [18]{32]. The second problem is resource management. A VCC is granted 
only if sufficient network resources exist over the transmission path. With a multipoint- 
to-multipoint connection, the VCC is shared by a number of sources and determining the 
bandwidth requirements Is difficult. 

Various proposals have been made to implement multipoint-to-multipoint 
connections within ATM. The simplest method for implementing a multipoint-to- 
multipoint connection 1s a “forest of trees,” that 1s, using a point-to-multipoint connection 
per endpoint [32][36]. With N endpoints, every endpoint is the root of a point-to- 
multipoint connection with N—/ leaves. A “forest of trees” offers low latency per 
network node, but a member entering or exiting from the multicast group causes a burst 
of signal messages. This approach 1s specified by the ITU-T H.3XX multimedia 
conferencing standards. Another approach Is to use a Server as an intermediary [37]. 
Each endpoint transmits data over a point-to-point connection with the server. The server 
relays the data to the other endpoints through a point-to-multipoint connection for which 
it is the root node. The Shared Many-to-many ATM ReservaTions Protocol (SMART) 
(35] is a novel ATM layer level protocol that regulates access to the multicast tree. 
SMART requires only one VCC for the entire multicast group although more VCCs are 
allowed to support concurrent data transfer by two or more endpoints. Access to the 
shared VCC is provided by a grant mechanism implemented in a round-robin fashion. 
The SMART protocol has proven viable for multicast VTC traffic with suitable 


modifications to the grant mechanism to account for the needs of real-time traffic [38]. 
Cc WIRELESS NETWORKS 


The typical military wireless network 1s based on packet-radio technology that 
extends the concept of the point-to-point packet-switched network to a broadcast radio 
medium. Like some LAN standards, such as Ethernet, the radio channel is inherently a 
multiple-access medium that provides a much less reliable transfer medium than that 


experienced in wireline networks. As shown in Figure II.8, the data link contro] (DLC) 
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layer provides service to higher layer protocols, such as IP and ATM, by transferring data 
in packets or cells over the radio medium. The DLC specifically provides reliable 
transfer of information across the physical link and regulates access to the shared 
medium. The functionality of the DLC is separated into the logical link contro] (LLC) 
and medium access control (MAC) sublayers. While the functionality of the layers 
shown in Figure II.8 is described briefly below, a more thorough discussion of packet- 


based radio networks can be found in [39]. 
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Figure II.8: DLC for a Packet-Based Radio Network. 


1. Logical Link Control 


The LLC layer provides an interface to the network layer, either IP or ATM for 
example, and performs error and flow control. Error control involves providing 
mechanisms for responding to errors in transmitted frames while flow control regulates 
the flow of frames to ensure the sender does not overwhelm the receiver. Errors occur 
due to bit or burst errors during transit, which either damage the frame or cause the frame 
to be unrecognizable. Error control is usually provided by an automatic repeat request 
(ARQ) mechanism that combines error detection from the MAC with positive and 
negative acknowledgements and retransmission after timeout. For real-time traffic, the 
viability of the ARQ mechanism depends on the overall] delay budget, and the LLC may 
confine itself to dropping the corrupt data packets. The LLC layer may also attempt to 
correct errors if forward error correction (FEC) coding is employed. Another possibility 
is to perform power management at the LLC layer to vary transmission power in response 


to observed error rates. 
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oF Medium Access Control 


The MAC governs access to the transmission medium, performs conflict 
resolution and provides error detection. A MAC protocol is either centralized, where a 
controller grants access to the network, or decentralized wherein all stations dynamically 
determine access. Various protocols are available to control access including round robin 
or polling, reservation, and contention. With polling protocols, each station is given an 
opportunity to transmit in turn. Reservation schemes are more suitable for stream traffic 
and divide access time into slots, which allows stations to reserve slots when data is ready 
for transmission. Contention schemes work well for bursty traffic where all stations 
attempt to seize control of the medium and backoff when collisions occur. Contention 
works well only for light-loaded networks. Of the three schemes, reservation provides 
the greatest throughput and least delay for integrated wireless networks. Slot-based 
reservation schemes for wireless ATM networks and mobile IP networks have been 
proposed by [39] and [40], respectively. 

Referring back to Figure II.8, information flows in the flowing manner. The 
network layer passes cells or packets to the LLC. The LLC appends a control header, 
creating an LLC-PDU. The control header provides the data required for flow control 
and error control. The LLC-PDU ts passed to the MAC, which assembles a frame 
containing one or more LLC-PDUs along with address and error detection fields. Once 
access is granted to the radio medium, the frame is transmitted in order by the physical 


layer. 
3: Physical Layer 


The physical layer specifies the transmission medium, signal encoding, 
synchronization, and bit transmission/reception. Although the MAC layer determines 
access to the channel, a wideband radio channel may be segregated several ways [41]. 
The simplest is time division multiple access (TDMA) in which a sender transmits during 
a fixed time slot. The channel may also be split into several independent, smaller 


channels using frequency division multiple access (FDMA) or code-division multiple 
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access (CDMA) to allow multiple users to transmit simultaneously. Finally, TDMA may 


be combined with either FOMA or CDMA. 


D. LAYERED VTC OVER ATM 


1. ITU-T Multimedia Standards 


The ITU-T H-series recommends several standards for real-time multimedia 
communications, each targeting a different network architecture. The standards proposed 
for ATM networks are briefly reviewed to provide some motivation for the layered VTC 
over ATM implementations proposed in this dissertation. 

Each ITU-T H-series multimedia conferencing standard associates a set of video, 
audio, multiplex, and control standards into a multimedia terminal [42]. Each terminal 
provides point-to-point, real-time audio and video conferencing at various levels of 
quality with provisions for optional data transfer. Data transfer possibilities include 
graphics, still images, and control signals such as those needed for remote camera 
operation. Extensions to the base standards allow multipoint operation and encryption 
with appropriate network support. The ITU-T standards have found wide acceptance, 
and hardware implementations are readily available in PCI and compact PCI card 
formats. Two ITU-T standards address ATM networks: H.321 and H.310. 

H.321 is a first generation standard and adapts the earlier H.320 recommendation 
to ISDN networks [42][43]. As expected from a standard adapted from ISDN 
networking, H.321 allocates bandwidth in increments of 64 kbps. The baseline video, 
codec specified is H.261, which compresses color video at a constant bit rate in 
increments of 64 kbps. H.261 supports two resolutions: CIF (352X288 pixels) and QCIF 
(176x144 pixels). Baseline audio is compressed using the G.711 log-PCM codec, 
providing low-delay, toll-quality narrowband audio at 64 kbps. H.321] uses the AAL] 
protocol to support data channels equivalent to ISDN ‘B’ channels by mapping one ‘B’ 
channel per VCC. 

H.310 is a native standard for videoconferencing over ATM/B-ISDN and includes 


the earlier H.321 as a subset [42][44]. Figure II.9 gives a simplified functional 
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description of a H.310 terminal and associated standards for multiplexing, call 
establishment, and data transfer. Taking advantage of the high bandwidth available in B- 
ISDN networks, H.310 offers high-quality video using the MPEG-2 video codec and 
high-quality audio using Layer I1 MPEG-1 audio. To support H.321 terminals, H.261 
video and G.711 audio are also supported with H.263 video, a codec optimized for low 
bit rate channels such as analog modems, as an option. H.310 terminals support a variety 
of data rates, but all terminals are required to support common rates of 6.144 and 9.216 
Mbps. Calls are established by creating an initial VCC to set up acontro] channel. This 
contro] VCC uses the AALS protocol. Once two terminals have established a set of 
Operating parameters, a second VCC Is created to carry multiplexed audio and video. 


Either the AALI or AALS protocol is used. Additional VCCs may be established to 
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Figure II.9: ITU-T H.310 B-ISDN Terminal. 


2 Layered Video Considerations 


Compared to the ITU-T terminal recommendations, layered video poses a 


different set of considerations in determining a feasible network interface. Chief among 
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these is the desire to enable the ATM layer to discern which video layer owns an 
individual cell. Associating layers with individual cells allows an ATM switch to exploit 
the hierarchical nature of layered video through scheduling to actively control congestion 
while maintaining the best possible end-to-end video quality. Another benefit is offering 
recipients the ability to subscribe to any number of layers they initially choose as well as 
a means to add or drop layers during the session. This 1s the core promise of RLM [45]. 

A secondary concern 1s to allow the network to identify logical elements within 
the video stream, such as the frame header and group-of-block (GOB) boundaries (see 
Figure II.1). Locating GOB boundaries provides another dimension to network 
scheduling by allowing the switch to identify cells that will not aid video reconstruction 
at the recipient due to previous cell losses (see Chapter VI). Two approaches for 
allowing identification of video layers at the ATM layer are proposed here. The first is to 
assign each video layer to a Separate VCC. The second requires multiplexing individual 
layers over a single VCC. Each approach impacts the network interface design 
differently: the most appropriate AAL protocol, schemes for manipulating the ATM cell 
header, and the manner in which the multipoint-to-multipoint connection is established. 
GOB identification is considered only briefly here; more details are provided in Chapter 
VI. No attempt is made to provide a complete multimedia terminal specification such as 
H.310. Instead, the goal 1s to demonstrate the feasibility of supporting layered video 
within existing ATM standards. 

In addition to the layering scheme, the choice of AAL protocol depends on the 
services required by the application. Here, we assume that the audio and video streams 
are not multiplexed as they are in H.310. Segregating the streams allows different service 
for audio and video and simplifies network scheduling with respect to the layered video. 

We first consider the audio stream. The tactical scenario requirements (see Table 
I.1) limit the audio stream bit rate to 8 kbps. The G.711 and MPEG-] Layer 2 codecs are 
obviously incompatible with the scenario requirements. This is not surprising since 
H.310 targets B-ISDN. However, other high-quality narrowband audio codecs are 


available that specifically target low bit rates. Two suitable codecs specified in the H.324 
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recommendation for low-bit-rate circuit-switched networks, such as the PSTN, are the 
G.723.1 and G.729 codecs. G.723.1 transmits at either 5.3 or 6.4 kbps and offers near- 
toll-quality speech although codec delay is rather large for VTC applications [5]. G.729 
offers higher quality and lower coding delay for a similar level of complexity. Both 
codecs offer silence detection to reduce bit rate by either not transmitting or transmitting 
only background noise. Of the two, G.729 appears the best choice for the tactical 
scenario considered here. Given that G.729 transmits at a fixed-bit-rate, the AAL] 
protocol appears to be best suited. 

The question for the video stream 1s not which codec to use, since a layered coder 
is assumed, but the type of rate control to employ. Three options are possible: CBR, 
VBR with no constraints, and VBR with bit-rate constrained to a predetermined average. 
Assuming a fixed quantization scheme at the encoder, compressed video is naturally 
VBR since compression gain varies frame-to-frame. Bit rate constraints come at the cost 
of quality variations [46]. CBR tends to show larger fluctuations in visual quality relative 
to VBR and may be unappealing at low bit rates. VBR with a predetermined mean-bit- 
rate demonstrates quality fluctuations between VBR and CBR. As indicated above, VBR 
streams have another advantage in that bandwidth can be conserved through statistical 
multiplexing, a significant advantage in low bit rate networks. However, resource 
allocation is simpler if the mean bit rate is constrained since ATM traffic descriptors, 
such as PCR and SCR, are easier to determine. For these reasons, the video stream is 
assumed to be VBR constrained to a predetermined mean bit rate. Only the AALI 
protocol is rendered unsuitable by this assumption and choosing among the remaining 
protocols depends on limitations introduced by video layering as discussed below. 

The last issue to consider is that of synchronization of the audio and video 
streams. We assume that if the application is given suitable timing information for each 
stream, then it is capable of synchronizing playback. Timing information is either 
provided to the application by the AAL or determined directly using time-stamps 
embedded in the application PDU. The former approach is available only if AAL1 or 
AAL2 is used. The latter 1s offered by encapsulating application data within a RTP 


packet. Each RTP packet includes a 32-bit timestamp corresponding to the time when 
the first octet of data was generated. The exact approach taken in this work is outlined 


below. 
3. Multiple VCC Case 


In the multiple VCC approach, each video layer 1s assigned a separate VCC and is 
readily identified within the network by its VPI/VCI pair. For scheduling purposes, a 
switch needs to logically associate the VPI/VCI pairs transporting the video layers from a 
particular sender and to establish a hierarchy for priority service. A simple means of 
logically associating layers is to assign one VPI per sender or to negotiate VPI/VCI pairs 
in contiguous blocks’. Using multiple VCCs conveys several advantages. Using 
individual VCCs allows a great deal of flexibility in providing service on a per-layer 
basis. The sender can negotiate different service and different QoS for each individual 
layer, even in the absence of a dedicated scheduling algorithm for layered video. 

Multiple VCCs also simplifies the task of allowing end users to subscribe to individual 
layers at cal] setup and dynamically add or drop layers once the VTC is in progress. A 
penalty is paid due to the large number of connections. Call setup time is increased and 
changes to the multipoint-to-multipoint connection incur a proportionate increase in 
signaling amongst the end-points. 

Service is provided to each layer using the AALS protocol. AALS offers the 
lowest overhead of the VBR protocols, eight octets per CS-PDU and no additional 
overhead in the SAR-PDUs. It is also the most appropriate choice if a higher-level 
protocol, such as RTP, 1s employed. 

Data transfer proceeds as shown in Figure II.10. The video compressor relays 
application PDUs over to the AAL after time-stamping each to facilitate synchronization 
with the audio layer. An application PDU consists of a single GOB, multiple GOBs or an 
entire frame. The choice depends on the manner in which frame elements are exploited 


by the coder. In the CS sublayer, an eight-octet trailer 1s appended, and the CS-PDU is 


* Negotiating VPIs and/or VCIs is not supported in UNI 3.1 but is supported by UNI 4.0 [47]. 
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padded out to a multiple of 48 octets. The trailer includes the CPCS user-to-user 
indication field, which allows transparent transfer of user information between end-users 
or application layers. The user-to-user indication field identifies the video layer (0 = 
base, 1 = first enhancement layer, and so on), which enables the end application to 
associate each incoming VCC with a layer and correctly reassemble the video stream. 
The SAR sublayer segments the CP-SDU into 48-octet SAR-PDUs; no headers or trailers 
are necessary. At the ATM layer, each SAR-PDU is encapsulated into an ATM cell 


information field. 
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Figure IJ.10: Transmitting Layered Video Using AALS and Multiple VCCs. 


Since the AALS SAR merely segments the CS-PDU, the endpoint CS sublayer 
cannot distinguish between SAR-PDUs containing the CS-PDU payload and the SAR- 
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PDU containing the trailer that ends the CS-PDU. To distinguish between these cases, 
the SDU-type bit in the payload type field is used. At the ATM layer, a CS-PDU consists 
of zero or more ATM cells with the SDU-type bits set to zero followed by an ATM cell 
with the SDU-type bit set to one. The latter indicates the presence of the CS-PDU trailer 
and the end of the CS-PDU. This scheme also allows the network to determine the 
boundaries of the application PDU by tracking changes in the SDU-type bit. Figure I.11] 
shows how a GOB, assuming that the application PDU consists of a single GOB, is 
located within the ATM cell flow. Therefore, a scheduling algorithm could track the 


SDU-type bit to incorporate GOB boundaries into scheduling decisions. 
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Figure IJ.11: Use of the SDU Bit to Locate Application PDU Boundaries with AALS. 


Establishing a multipoint-to-multipoint connection follows the procedures 
outlined under ATM multicast above with the difference that a separate point-to- 
multipoint connection must be established for each layer. The order in which 
connections are established is potentially of importance if the network possesses limited 
resources over any path that forms part of aconnection. To preserve the hierarchical 
nature of video layering, the first point-to-multipoint connection established should be the 
VCC associated with the base layer. In turn, VCCs associated with the enhancement 
layers are established, one by one, in order of each layer’s perceptual importance. While 
establishing a complete set of connections in this manner entails a longer setup time than 
negotiating each connection simultaneously, a hierarchical connection order prevents lack 
of resources from denying a connection to a more perceptually important layer in favor of 
a less important layer. Therefore, the network arbitrates which layers receive connections 
based on the resources present over all paths composing the point-to-multipoint 
connection. If an endpoint workstation does not possess the capability to decode all the 


layers comprising the video session, the workstation can refuse connection to unwanted 
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layers. The individual endpoint should also deny connection in the case of an illegal 
layering arrangement. This may happen if the network does not permit a connection for a 
layer while a less important layer is allowed to establish a connection due to smaller 


bandwidth demands. 
4. Single VCC Case 


The case for limiting the layered video stream to a single VCC is driven by the 
desire to minimize the number of active connections in the multipoint-to-multipoint 
connection. While VCIs are not a scarce commodity — a single VPI can bundle as many 
as 65536 VCIs with the values 0-32 reserved [23] — signaling and control requirements 
increase with the number of connections, which subsequently increases call setup time. 
An alternative approach 1s to multiplex cell flows from each layer within a single VCI. 
Multiplexing flows over a single VCI is only supported by AAL2 and AAL3/4. Since 
AAL3/4 has been largely replaced by AALS, the problem of supporting a single VCC 
rests on determining a Suitable interface between the application layer, the AAL2 
protocol, and the ATM layer. 

Unlike the other AAL protocols, AAL2 specifies only a CS sublayer and does not 
utilize a SAR sublayer [33]. The CS sublayer functionality is further split into service- 
specific (SSCS) and common parts (CPCS) sublayers. The simplest SSCS definition is 
the null SSCS which transfers application PDUs directly to the CPCS sublayer. Other 
definitions remain under study, and a SSCS definition for layered video traffic is 
proposed below. The CPCS sublayer multiplexes individual cell flows and provides 
VBR traffic support. 

The following service approach 1s proposed to adapt AAL2 for layered video. 
Referring to Figure II.12, each layer is assigned a service access point (SAP) at the AAL 
SSCS sublayer. The application PDU consists of a GOB, a contiguous set of GOBs, or a 
frame from a particular layer. The application PDU is buffered within the SSCS sublayer 
and transmitted in blocks to the CPCS sublayer. Block size is set at 44 octets to increase 
transmission efficiency. If the application PDU length 1s not a multiple of 44 octets, a 


variable length block is transmitted with length < 44 octets. 
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Figure II.12: Transmitting Layered Video Using AAL2 and a Single VCC. 


The CPCS sublayer accepts blocks from each SSCS SAP and appends a three 
octet header to form a CPCS packet. Within the header, the Channel Identifier (CID) 
uniquely identifies the layer number. The CID field is 8 bits in length which, after 
allowing for reserved values, permits identification of up to 248 individual channels. 
Since available channel numbers start at 8, one possible scheme 1s to start numbering 
channels with CID = 8 + layer number, where layer numbers start at zero for the base 
layer. The length indicator field is set to reflect either a fixed payload length of 44 octets 
or a smaller, variable value for the last segment in the application PDU if the application 
PDU is not an even multiple of 44 octets. The CPCS packet is then loaded into a CPCS- 
PDU with an 8-bit start field header. If the length of the last CPCS packet is less than 47 
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octets, a trailer 1s added to pad the CPCS-PDU to 48 octets. The combined overhead of 
the CPCS packet header and the CPCS-PDU start field header 1s exactly four octets. 
Therefore, a block size of 44 octets at the SSCS sublayer simplifies processing by the 
AAL since each CPCS packet and associated CPCS-SDU 1s transported within exactly 
one ATM cell. 

An alternate approach that reduces overhead 1s to buffer application-PDUs at the 
SSCS sublayer. Each application-PDU is segmented into 44-octet blocks as before and 
transmitted to the CPCS sublayer. If an application-PDU 1s not an even multiple of 44 
octets, the leftover bits are retained at the head of the SSCS buffer. When the next 
application PDU is buffered and segmented at the SSCS sublayer, data from the last 
application-PDU is encapsulated into the first CPCS packet. Although this approach 
transmits data from different application-PDUs in the same ATM cell, overhead is 
reduced considerably since every CPCS packet is filled to 44-octets, obviating the need to 
ever pad the CPCS-SDU. 

At the destination AAL, the CPCS sublayer strips the SF header off the CPCS- 
PDU and reads the CID field within the CPCS packet header to route the payload 
appropriately to the SSCS sublayer. No specific functionality 1s envisioned for the 
receiver side of the SSCS sublayer. The SSCS sublayer merely accepts the payload from 
the CPCS sublayer and forwards it to the application layer. There is no need to recreate 
the application PDU since the decoder is assumed to be capable of interpreting the raw 
bit stream. 

The above approach allows the cell flows of each layer to be multiplexed over a 
single VCC. However, the network 1s unable to distinguish between the different flows if 
the only indication lies within the ATM cell information field. As ATM switches only 
read cell headers, layer designation must occur using fields within the cell header as 
shown in Figure II.4. By design, ATM cell headers are relatively small, incorporating 
only the information required for ATM switches to perform their switching and 
congestion control functions. Therefore, the sender has very little flexibility in setting 


individual fields within the header that are not subject to being overwritten by switches. 
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However, the SDU-type bit and the CLP bit are available to the user [23]. Used together, 
the two bits allow indication of up to four layers (although only three layers are employed 
here) as indicated in Table IJ.3. The CLP bits are enabled for the lower priority layers. 
Setting the CLP bit does not necessarily indicate cells from enhancement layers are 
automatically dropped during periods of congestion. The user ts allowed to negotiate 
QoS separately for the cell flow consisting of cells with the CLP bit set to zero and the 
cell flow consisting of all cells (CLP = 0/1) [28]. Setting the CLP and SDU-type bits 
requires extending AAL2 to communicate with the ATM layer in a manner similar to the 
interaction between AALS and the ATM layer. A method to accomplish this 1s to 
transfer the CID field value with the CPCS-PDU. The ATM layer uses the CID value to 
determine an index into Table II.3, index = (CID —8), and sets the CLP and SDU bits 


appropriately. 


Layer Number SDU bit CLP bit 
0 0 0 
j J 0 
2 0 j 
3 (not used) J I 


Table I1.3: ATM Cell-Tagging Scheme for Layered Video. 


In the multiple VCC case, the SDU-type bit is available and enables the network 
to determine the application PDU boundaries in order to incorporate logical video 
elements such as a GOB or frame into scheduling decisions. The cell-tagging scheme 
presented in Table II.3 does not permit a similar approach at the network level. An 
alternative approach requires the AAL to segregate CPCS-PDUs resulting from each 
layer’s application PDUs. The segregated CPSC-PDUs are then handed to the ATM 
layer and transmitted sequentially. Since each CPCS-PDU comes from the same channel, 
an application PDU appears to the network as a contiguous set of cells, each with the 
same cell-tags. By monitoring changes in the CLP and SDU-type bits, the network can 


identify application PDU boundaries. This approach is shown in Figure II.13. While 
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convenient, concatenating application PDUs within the VCC impacts scheduling 


performance. This issue 1s covered in more detail in Chapter V. 
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Figure IJ.13: Identifying Application PDUs in a Multiplexed Cell Flow. 


Setting up a multipoint-to-multipoint connection requires each sender to establish 
separate point-to-multipoint connections for the audio and video streams. Compared to 
the multiple VCC approach, creating and maintaining a VTC session with a single VCC 
reduces signaling requirements. However, using a single VCC reduces flexibility in 
heterogeneous networks. When the initial connection is established, the sender must 
negotiate acceptable QoS for the entire video stream. While this appears to negate the 
flexibility offered by transmitting layers, the sender still has the option of negotiating 
QoS separately for the CLP = 0 and CLP = 0 + 1 cell flows. For similar reasons, 
individual endpoints cannot refuse individual layers at call setup and must accept the 
entire video stream or decline the connection. Still, it is desirable to allow an endpoint to 
dynamically drop layers, both to ensure that the more important layers arrive and to 
reduce bandwidth demands within the network if no downstream nodes require certain 
layers. Chapter VI proposes a scheme that allows the network scheduler to effectively 
drop individual layers within a VCC when no destination indicates an interest in those 


layers. 


This chapter examined architectures suitable for transporting real-time, interactive 
multimedia information streams. A suitable network architecture needs to meet the 
following requirements: multicast support, QoS guarantees, and real-time support. The 
ensuing discussion indicated that only ATM networks currently meet all three 


requirements. Given that ATM is a viable networking architecture, two approaches are 
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presented to transmit layered video. The first approach assigns each layer to a separate 
VCI using AALS. This approach is the most versatile in allowing network access to 
individual layers; it scales well and provides easy access to GOBs within each layer. The 
primary drawback is the increased signaling in a multicast scenario since each individual 
connection represents the base of a multicast tree. The second approach multiplexes each 
layer across a single VCI using AAL2. This approach offers quicker call setup and 
minimizes signaling in multicast scenarios but requires modification to the CPCS 
sublayer to tag each cell with an appropriate identifier for each layer. On the other hand, 
a single VCI cannot scale beyond four layers, and organizing the stream into recognizable 


GOBs is somewhat complicated. 
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Ii. VIDEO CODING TECHNIQUES 


Even when considering the modest requirements outlined for the video 
teleconferencing scenario presented in Chapter I, raw video signals are very bandwidth 
intensive. Consider an example using the specifications listed Table I.1 with gray-scale 
video only. Sending an uncompressed grayscale video stream at 8 bits per pixel requires 
a bandwidth of approximately 2 Mbps; this is not an insurmountable requirement with a 
dedicated wireline ATM network but clearly excessive for tactical video 
teleconferencing. Restricting the video stream to an average of 64 kbps requires a 
compression gain of about 31 to | or an average bit allocation of 0.26 bits per pixel (bpp). 
Transmitting a true-color vide sequence over the same channel would require a 
compression gain three times higher. 

This chapter presents a basic discussion of hybrid video coding and includes 
transform coding, motion compensation, quantization, and entropy encoding. A quick 
measure for quantifying distortion due to quantization is introduced as a measure of 
picture quality. The MPEG and H.263 video coding standards are described and 
examined for error resilience. Finally, wavelet-based image compression is presented in 


preparation for the layered video discussion in the next chapter. 
A. VIDEO COMPRESSION OVERVIEW 


Video coding involves a combination of removing perceptually redundant 
content, representing information efficiently through lossless coding, and exploiting 
frame-to-frame correlation within a video sequence. Motion video is typically low-pass 
in nature; the human eye places greater relative weight on lower frequencies than higher 
frequencies [6]. Therefore, 2-D transform methods are used to generate an equivalent 
frequency domain representation, a process that is lossless and invertable. Using this 
representation, variances in human perception are exploited by quantizing the resulting 


coefficients to different degrees of precision with more precision granted to the lower 
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frequencies. Quantization reduces the dynamic range of the coefficients, which results in 
information loss but enables the coefficients to be represented with fewer bits. Usually, 
the least relevant coefficients are zeroed out during quantization, thus creating runs of 
zeros. Since there is little need to explicitly represent the zeros, run-length coding is used 
to generate a more compact representation that is, in tum, replaced by a more efficient, 
lossless variable-length coding (VLC). Taken collectively, these techniques are referred 
to as spatial compression and form the basis of image compression standards, such as 
JPEG: 

A video codec must compress a time-varying video sequence consisting of a 
series of frames spaced at equal time intervals. The codec may or may not exploit the 
temporal dimension depending on the application requirements. The simplest approach 1s 
to ignore any correlation between individual frames and compress each frame 
independently as if it were a still image. This approach is known as intraframe coding, 
and the resulting compressed frames are referred to as I-frames. An example is Motion- 
JPEG, which uses JPEG to code individual frames. Intraframe coding offers the 
advantage of error resilience since decode errors are confined always to the current 
frame. However, compression gain is limited to about 0.5 bits/pixel with acceptable 
image quality [6]. Higher compression gains are possible, for the same quality, by 
exploiting the high degree of correlation that video frames tend to exhibit from frame-to- 
frame. Interframe coding removes redundancy by only coding the differences between 
successive frames. When these differences arise due to motion, interframe coding yields 
compression gains that vary in relation to the degree and type of motion. Static frames 
exhibit a high degree of compression while rapid motion tends to degrade compression 
performance. The drawback to interframe coding is the dependence between successive 
frames at the decoder. If errors occur in the current frame, the errors tend to propagate 
temporally between successive frames as well as spatially within the frames. Of course, 
if two successive frames are not correlated, perhaps due to a scene change, interframe 


coding performs no better — typically worse due to additional overhead — than intraframe 


50 


coding [7]. Therefore, video codecs, such as H.263 and MPEG, incorporate both types of 


coding for efficiency and, in some cases, to place an upper bound on error propagation. 
B. VIDEO CODING HIERARCHY 


To facilitate different aspects of video coding and decoding, the video stream is 
organized into a hierarchy of logical elements. The organizational scheme varies from 
coder to coder, but the most common elements are presented below. 

The basic display unit is the picture or frame and is comprised of rectangular 
array of pixels, which in turn represent data structures indicating the color and luminosity 
of each pixel. The dimensions of the array represent the picture resolution, given as 
columns X rows, where the codec of choice determines the available resolutions. A set 
number of contiguous pictures are organized into a group of pictures (GOP). A GOP 
usually influences compression gain and consists of an intraframe coded picture followed 
by a series of interframe coded pictures. 

Within a frame, pixels are organized, in order of increasing size, into blocks, 
macroblocks, and groups of macroblocks (GOB) or slices. A block is an 8x8 array of 
pixels and is the basic element for transform coding operations, such as the discrete 
cosine transform (DCT). Motion compensation 1s applied at the macroblock (MB) level, 
a 16X16 array of four blocks, to reduce the associated overhead and computational 
expense. A frame may be viewed as being composed of rows of macroblocks. For 
example, a frame with a resolution of 176x144 pixels contains nine rows of macroblocks 
with eleven macroblocks per row. One or more contiguous rows of macroblocks are 
termed a GOB or a slice depending on the codec. GOB is the more general term while 
the term slice is defined within the MPEG-1/2 standards [6]. GOB headers, along with 
the frame header, serve as reference points that allow the decoder to resynchronize with 
the incoming bit stream after decode errors caused by lost packets or bit errors. A 
representation of the hierarchy superimposed on the compressed bit stream 1s shown in 
Figure III.1; the length of each compressed frame varies due to variable compression 


gains. 
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Figure III.1: Organizational Hierarchy for Compressed Video. 


C. INTRAFRAME CODING 


Intraframe coding (or spatial compression) is essentially the same as still image 
compression. Each frame is compressed independently by removing redundant 
information within that frame, balancing compression against image quality, and coding 
the remaining information in a more efficient manner. No attempt 1s made to exploit 
temporal correlation existing between frames. The three steps comprising intraframe 


coding are shown in Figure IJI.2 and are explained further below. 
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Figure III.2: Overview of the Steps Comprising Intraframe Coding. 
1. Transform Coding 


A frame represents a sampled version of the original scene at a single instant in 
time. Contiguous regions of samples (or pixels) tend to be highly correlated, and in 
practice compression through direct scalar quantization is inefficient’. Instead, 
application of a suitable linear transform to decorrelate the samples gives a greater level 
of compression for a given encoder complexity [48]. 

A suitable transform increases compression efficiency as follows. A signal is 
decorrelated if application of the transform results in diagonalizing the signal’s 
autocorrelation matrix. Equivalently, the resulting transform coefficients are not 
correlated. An optimal transform tightly packs energy into the smallest number of 
coefficients possible, a property known as “energy packing” efficiency [48]. The 
advantage is that if the coefficients are arranged in decreasing order of magnitude, 
retaining only the first k out of N coefficients gives the least distortion as measured by 
MSE. The advantage is that, although the transform 1s lossless, a given level of 
quantization results in the least distortion of the original data. 

Another advantage of transforms 1s that the new domain is often more appropriate 
for perceptual-based quantization. Certain transform coefficients may hold greater 
perceptual relevance. For example, the human visual system (HVS) places the most 


importance on low frequency details in images or video [6]. This dependency may be 


> Still, direct techniques are employed where lossless compression is the primary concern. 


55 


exploited using frequency-based transforms and then distributing quantization errors in 
relation to the relative importance of each coefficient. 

In theory, the discrete-time Karhunen-Loeve transform (KLT) provides the 
greatest energy packing efficiency [49]. However, the KLT is both computationally 
intensive (order of N’) and signal dependent, thus requiring a separate eigenvector 
calculation for each transformed data block. These liabilities preclude the use of the KLT 
in video compression. Instead, video coders use transforms that approximate the KLT’s 
energy packing efficiency and possess more efficient algorithms. 

The most widely used transform for image processing 1s the two-dimensional 
discrete cosine transform (DCT). The DCT provides the closest energy packing 
performance to the KLT, and numerous fast algorithms are available, frequently 
implemented in hardware, that reduce the computational effort to the order of Nlog2N [6]. 
For example, a 2-D DCT can be implemented with as little as 54 multiplication 
operations [50]. 

A frame is transformed by dividing its elements into NXN blocks of pixels and 
applying the 2-D DCT to each individual block. The typical block size is 8x8. Larger 
block sizes are possible, but the pixels tend to be less correlated, which decreases the 
resulting compression gain. Denoting the original block as f(i,j) and the transformed 


coefficient block as F(u,v), the 2-D DCT 1s given by [6] 
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where uw and v are the horizontal and vertical frequencies, respectively, and 
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The inverse DCT is given by 
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Transforming an 8x8 block of pixels results in a block of 64 coefficients with a 


Spatial frequency distribution as shown in Figure III.3. The F(0,0) coefficient represents 


54 


the DC value while the remaining coefficients are termed AC coefficients. Figure I.4 
indicates how images elements map into the frequency domain via the 2-D DCT [6]. 
Individual blocks within a frame tend to show little variation from pixel to pixel, an 
indication of low-pass frequency content. Given this condition, the magnitude of the 
DCT coefficients is largest in the region about the DC coefficient and diminishes with 


increasing frequency. 
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Figure III.3: Frequency Interpretation of DCT Coefficients. 
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Figure III.4: Structural Decomposition of Image Elements [6]. 


The need for data blocking in DCT-based compression becomes a lability with 


high levels of compression. Compression tends to remove high-frequency components, 


» 


which leads to smoothing of the visual content of each block and creates “blocking 
artifacts” that disturb the continuity of the frame. The same effect also leads to the 


presence of “ringing” artifacts around sharp edges [3]. 
2: Scalar Quantization 


The DCT coefficients are quantized to reduce precision, which allows each 
coefficient to be represented with fewer bits. Quantization may also remove the least 
significant coefficients by setting their value to zero. The tradeoff is added quantization 
noise, which shows up as distortion within the reconstructed image. The most typical 
quantization scheme employed 1s uniform cuantization wherein each coefficient F,,, 1s 
divided by the quantizer step size Q,, and the result rounded to the nearest integer as 


follows [5]: 
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The reconstructed value is found by multiplying the quantized coefficient by the 


quantizer step value, F,, xQ,,. As Eq. (II.4) implies, the quantizer step value may vary 


with each DCT coefficient as discussed below. In this case, Q,, represents an element 
from an NXN quantizer matrix. Alternatively, a single value may be used for the entire 
block for simplicity. Although uniform quantization is widely used, the choice is not 
optimal since analysis has shown that individual coefficients are not distributed 
uniformly [51]. Other approaches have been suggested to reduce the quantization error, 
such as employing a separate Max-Lloyd quantizer for each coefficient [52], but the gain 
does not appear to outweigh the computational effort. 

Since not all coefficients are significant, some may be discarded prior to 
quantization [6]. In maximum variance zonal sampling, the coefficients are ordered by 
the magnitude of their variance and a fraction of the N’ coefficients with the largest 


variances are retained with the remaining coefficients set to zero. Threshold sampling 
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performs the same function but retains coefficients on the basis of the largest magnitude 
[6]. 

However, the most common approach is to weight the relative importance of each 
coefficient by careful selection of quantizer step values Q,,. Small quantizer step values 
yield less distortion but require more bits. Larger quantizer step values introduce larger 
distortion but tend to result in more zeros and require fewer bits. Choosing the optimal 
step size requires Selecting a suitable criterion, either through a bit-allocation approach or 
human visual system (HVS) modeling. In bit-allocation, the magnitude is chosen to 
minimize distortion within a bit budget for the block or frame. One optimal scheme 
varies each quantizer in proportion to the variance of the coefficient, which yields the 
same average distortion for each coefficient [48]. However, bit allocation schemes fail to 
account for human sensitivity to different spatial frequencies. Instead, most international 
coding standards, such as JPEG and MPEG, employ quantizer matrices based on HVS 
models. Using HVS models as a reference, the quantizer step sizes are chosen such that 
lower frequency coefficients are quantized more finely while higher frequency 
coefficients are quantized more coarsely [6]. The HVS is also more sensitive to 
luminance intensity than chrominance, so different quantizer matrices are developed for 
each. 

A desirable feature in video encoders is the inclusion of rate control for the 
outgoing compressed video stream since each frame’s compression gain depends on the 
frame’s contents. For example, the encoder may attempt to maintain a constant bit rate or 
a constant average bit rate, or to allow bit rate to vary without constraint. Control is 
exercised by varying video quality to achieve the desired bit rate. Referring to Figure 
HI.2, only the quantizer introduces distortion and affects the reconstructed quality of the 
frame. Therefore, rate control schemes use feedback to dynamically alter the distortion 
introduced at the quantizer’. The simplest approach is to apply a scaling factor to the 


quantizer matrix to increase or decrease the magnitude of each element. However, 


° The intermixture of intraframe and interframe coding also effects the bit rate but is usually set prior to 
encoding and not varied dynamically. 
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controlling bit rate reduces the coder’s freedom to control quality. CBR video displays 
wider variations in visual quality compared to VBR video, which does not constrain bit 


rate. 
ok Entropy Encoding 


The quantized coefficients may be represented in a more efficient manner using 
source or entropy coding, thereby further increasing the compression gain. Video coders 
use a combination of run-length encoding and variable length coding. 

Run-length encoding (RLE) is the simplest form of entropy coding and is 
frequently employed in both lossless and lossy compression schemes. Using RLE, a data 
block is parsed to locate sequences of repetitive values. Each sequence is replaced by a 
codeword consisting of a delimiter and the number of times the value is repeated. If the 
data block contains a great deal of repetitive information, a significant reduction in size 1s 
possible. Following quantization, the coefficient block typically contains a large number 
of zeros, especially amongst the high-frequency coefficients [6]. As the compression 
gain depends on the length of the sequence, rearranging the coefficient block as a vector 
in zig-zag fashion, starting from the DC coefficient down to the F(8,8) coefficient, has 
been demonstrated to increase the run-length of the zeros. Different codewords are used, 
but the most common scheme consists of the run-length of zeros followed by the size or 
magnitude of next non-zero value. If no non-zero values remain, a special end-of-block 
codeword replaces the sequence. 

After RLE, the quantized coefficient block is represented by a set of codewords 
with each representing a symbol drawn from a larger source alphabet. Variable-length 
coding (VLC) minimizes the average codeword length by assigning shorter codewords to 
the most probable symbols and longer codewords to the least likely symbols, and each 
codeword is uniquely decipherable. Huffman coding is the most widely used entropy- 
encoding algorithm and 1s guaranteed to produce a minimum average length, uniquely 
decipherable code [5]. The Huffman algorithm uses each symbol’s probability of 
occurrence and builds a prefix code using an optimum binary-branching tree. Since both 


the coder and the decoder need to use the same codebook and generating a Huffman table 
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is computationally expensive, standard tables are normally pre-defined using data drawn 
from test images. An optimal representation 1s not guaranteed, but encoding and 


decoding are faster and the need to transmit the VLC table is avoided. 
4. Quality of Reproduced Video 


Given that video coders trade compression gain for image quality, quantifying the 
level of distortion introduced due to coding is useful in evaluating different coding 
schemes. A useful measure of image distortion D is to calculate the mean square error 


(MSE) between the original (x ) and reconstructed ( x ) images [6]: 
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Using the MSE to quantify distortion 2, the signal-to-ratio (SNR) is determined as 
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SNR = 10log,, oe (111.6) 


where G° is the input variance. The most widely published measure of image quality is 


the peak signal-to-noise ratio given by |6] 


PSNR = 10 log, x (itil) 
where K is the maximum peak-to-peak value in the image, 255 for the typical 8-bit 
image. For example, a typical peak SNR for a typical JPEG encoded grayscale image is 
28 dB at 0.5 bits/pixel [6]. 

Using MSE as a measure of image quality does have drawbacks. MSE does not 
distinctly relate to perceptual quality since all errors are given equal weight. Two 
compression techniques yielding the same MSE for an image may deliver slight 


differences in perceptual quality [6]. 
D. INTERFRAME CODING 


Interframe coding exploits frame to frame correlation or temporal redundancy to 
deliver greater compression gains for a given level of quality. The degree of redundancy 


depends on the scene’s motion content due to either motion of objects within the scene or 
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scene movement caused by a camera pan. Static scenes with |ittle motion show a high 
amount of frame-to-frame redundancy. For example, the VTC scenario considered in this 
work assumes motion video sequences consisting of a “talking head,” 1.e, a single speaker 
talking against a static background. An opposite example is a scene change, where 
successive frames have completely different content. 

Several source-coding techniques are employed to remove temporal redundancy 
including block updating, differential pulse code modulation (DPCM), and motion 
compensation. Each technique is suitable for a certain range of motion content. 
Generally, exploiting redundancy as motion content increases requires more complex 
techniques, which in turn decrease decoder robustness. As stated above, interframe 
coding offers the potential for a lower bit rate for a given level of quality. Conversely, 
interframe coding offers better quality for a given bit rate. The relative gain, as compared 
to intracoding, for the intercoding techniques presented here is documented in [53] for 


low and high motion video sequences. 
ih Block Updating 


The simplest interframe coding approach is a simple variation of intraframe 
coding. In low motion video scenes, such as “talking head” video, motion is confined to 
a smal] region within the scene while the background remains static. Block updating 
conserves bandwidth by coding and transmitting only those blocks that have changed 
perceptibly since the last frame [54]. Each block f(1,) 1s compared to its counterpart in 
the previous frame, and a distance metric is calculated. If the distance is below a certain 
threshold, no update for that block is transmitted. Otherwise, the block is intracoded as in 
Figure III.2 and transmitted. Block updating 1s sometimes combined with an aging 
scheme that periodically forces block updates, which mitigates hysteresis problems and 
guarantees that members joining a dynamic VTC session to receive the full scene within 


some set interval [45]. 
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2: Differential Pulse Code Modulation 


Another approach suitable for low motion video 1s DPCM. DPCM 1s a first order 
predictor that uses only the most recent sample to predict the next sample. Denoting the 
current frame as k and the reference frame as k - 1, DPCM subtracts the reference block 
f(i,j,k - 1) from the predicted block f(i,4). The resulting error block e(1,k) represents the 
prediction error between the predicted block and the reference block. Although little 
correlation is left in the error block on average [48], the error block is compressed as 
shown in Figure III.2, which results in an approach known as hybrid video coding. If the 
prediction error is small, the dynamic range of the pixels is considerably reduced, 
possibly down to zero, and DCT-based coding yields a higher compression relative to 
intracoding the original block since the error block has a predominant lowpass 
characteristic. 

Open loop DCPM has the disadvantage that errors introduced by quantization 
tend to accumulate over time at the decoder. Adding a feedback loop to the coder 
mitigates this problem. The predicted block is compared to a reconstructed version of the 
last frame maintained by the coder instead of the actual frame. Using the decoded frame 


as a reference compensates for quantizer error introduced by the coding process. 
a Forward Motion-Compensated Prediction 


DPCM gives the best results when a scene is mostly static. With increasing 
motion content, the probability of poor correlation between the predicted block and the 
reference block increases. Past some point, DPCM actually yields inferior performance 
relative to intracoding. Assume that the predicted block contains a discrete object, such 
as a ball. If the ball does not move, DPCM gives good results since the best reference 
block is at the same coordinate as the predicted block. If the ball is moving, the best 
matching reference block is offset relative to the predicted block, and DPCM delivers 
poor results. 

Motion compensation improves DPCM by comparing the predicted block to some 
region within the reference frame and finding a reference block that best matches the 


predicted block. The best match is determined by some criterion such as minimum 
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distance or maximum correlation. Since the search process 1s computationally intensive, 
real-time applications confine the search only to a small region about the predicted block 
while off-line coding may search the entire reference frame. The resulting error block is 
encoded as previously described under DPCM. Since the decoder needs the location of 
the reference block, a motion vector accompanies the encoded error block. The motion 
vector represents the location of the reference block as an offset (x,y) from the predicted 
block. DPCM is a special case of forward motion-compensation, using a motion vector 
of (0,0). 

Motion vectors add additional overhead to the encoding process with two 
implications for video coding. First, intraframe coders apply motion compensation at the 
macroblock level by associating four blocks with a single motion vector to reduce 
overhead. Second, motion compensation 1s only employed when a net gain in 
compression is possible over DPCM or intracoding after taking the overhead due to the 
motion vector into account. Most coders use the distance metric to determine the most 
appropriate method for encoding each macroblock, 1.e., intercoding, either with motion 


compensation or DPCM, or intracoding.’ 
4. Bi-directional Motion Compensation 


Forward motion compensation fails when no suitable reference exists in the 
previous frame. Such a Situation arises whenever a scene change occurs or when motion 
reveals objects that are concealed in the previous frame. Bi-directional motion 
compensation improves coding in these situations by selecting the best reference block 
from either the previous frame or the subsequent frame. As before, the error block is 
encoded and transmitted along with a motion vector and a flag indicating which frame 
serves as the reference. The coder may also interpolate from the best matches in each 
reference frame although this approach requires transmission of two motion vectors. 

The cost of adding bi-directional prediction is considerable and limits its 


suitability to off-line or non-real-time compression. The need to search two reference 


’ The picture type may further influence the decision process as in MPEG. 


frames doubles both computational expense and buffer requirements. Also, the reliance 
on past and future frames requires that both the coder and decoder delay compression of 


the current frame until the subsequent frame is available. 
a Distance Metrics 


In motion compensation, distance metrics are used to quantify the distortion 
between a candidate reference block and the predicted block. The best matching 
reference block generates the least distortion and thus provides the best match. Three 
distance metrics commonly employed are [6] [45]: mean squared error (MSE), sum of 
absolute differences (SAD), and absolute sum of differences (ASD). The corresponding 


mathematical expressions are given by: 
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where x, ,, represents the pixel intensities within the predicted block while x* 
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m+in+j 
represents the pixel intensities in the, possibly offset, reference block. The reference 
block is offset relative to the predicted block by the motion vector (i,j). 

Although several H.261 video codec implementations employ MSE as a distance 
measure [6], MSE requires expensive multiplication operations, which makes it less 
suitable for real-time applications. SAD and ASD require the less complex absolute 
value operator and otherwise require only addition operations. SAD was incorporated 
into the H.263 test model [55], an approach probably adopted by commercial 
implementations. ASD has found use in block updating since taking the absolute value 
after the summation reduces the impact of noise introduced during video capture, thereby 


reducing spurious background updates in low motion video [45]. 
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6. Hybrid Video Coding 


Hybrid video coding combines motion compensation with the DCT-based coder 
shown in Figure JJJ.2. A functional block diagram of a hybrid coder is shown in Figure 
IJI.5. Similar to intracoding, the current frame is broken into a sequence of macroblocks, 
and a separate coding decision 1s made for each macroblock. The motion estimation 
block compares each macroblock to the reference frame({s) and decides whether 
intracoding or intercoding is more appropriate. For example, Telenor’s H.263 test model 
[S5] employs a SAD-based coding decisicn algorithm. If intracoding is indicated, DCT- 
based compression is applied to each individual block within the macroblock. If 
intercoding is selected, the reference macroblock is subtracted from the predicted 
macroblock, and the error block is encoded. The motion vector is encoded separately 
using a VLC although motion vectors are optional for simple DPCM. 

Figure III.5 also illustrates the feedack path used to prevent the accumulation of 
quantization errors at the decoder. After gach macroblock is quantized, the quantization 
and transform operations are reversed, and the results are used to update the reference 
frame. Not shown is the controller functionality. The controller implements either open- 
loop or closed loop rate-control, in coordination with the network, by controlling 


distortion introduced in the quantizer and b:’ controlling encoding decisions available to 


) 


the motion estimation block. 
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Figure I1I.5: Hybrid Video Coder with Motion Compensation and DCT-based 


Compression. 
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E. ERROR ROBUSTNESS 


Transmission errors are an inevitable part of any communication network and 
occur both within the channel and within the network. Communication channels are 
characterized by bit error rate (BER), typically 10° for fiber-optic systems and 
considerably more for copper-based wireline and wireless systems. Random bit errors or 
burst errors due to channel noise may corrupt either the payload or the packet header. 
Packet header errors are the more serious of the two, raising the potential for misrouted 
packets or preventing the network from identifying the packet. Losses may be mitigated 
with forward error correction and retransmissions, but the latter approach is untenable 
with real-time traffic. ATM networks only check for errors in the cell header and are 
able to correct single-bit errors [18]. If multiple bit errors are detected, the cell is 
discarded. The AAL layer at the receiver may handle payload bit errors or leave error 
handling to higher layers. Network losses occur due to buffer overruns at network nodes 
during periods of congestion or when the arriving aggregate traffic prevents the switch 
from servicing each connection to its required QoS. Although network architectures, 
such as ATM, allow a call to specify cell loss probability prior to call acceptance, cell 
losses do occur, especially if the transmission path employs a wireless interface. The 
impact of transmission errors depends of the error resilience of the codec. 

Each cell loss or bit error degrades the quality of the reconstructed video stream 
through two mechanisms depending on the type of video coding employed. Assume that 
a transmission error occurs such that a single macroblock is decoded incorrectly. The 
immediate impact 1s spatial corruption within the current frame [7]. Since the error 
disrupts the decoder’s synchronization with the bit stream, the corruption spreads 
spatially in scanline fashion until the decoder locates a valid symbol for 
resynchronization. Therefore, the visual corruption usually spreads through the 
remainder of the parent GOB or to the end of the frame. 

With intraframe coding, spatial errors do not persist beyond the affected frame 
since each frame is coded independently. Interframe coding, while giving greater 


compression gains, increases the impact of spatial errors by providing a propagation path 
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through subsequent frames. Again, consider the presence of one or more corrupted 
blocks in the last decoded frame. In interframe coding, the last decoded frame serves as a 
reference for predictive coding. Any error block received in the current frame that 
references a corrupted block yields another corrupted block. Therefore, spatial 
corruption propagates temporally. With motion compensation enabled, scene motion 
carries decoding errors spatially through the scene. This is particularly distracting since 
the human eye tends to follow motion [7]. Duration of temporal errors is dictated by the 
rate at which intracoded macroblocks are transmitted, which is in turn dictated by the 
codec. Factors impacting the relative error resilience of several popular codecs are 


presented below. 
1. Motion JPEG 


Motion JPEG treats the video stream as a sequence of still images, compressing 
each frame using JPEG. Since each frame is encoded independently, decoding errors are 


limited to the duration of the affected frame. 
2 MPEG 


MPEG-1 and MPEG -2 are designed to deliver high-quality audio-video 
compression for applications, such as CD-ROM multimedia, broadcast digital video, and 
high definition TV. MPEG employs the GOP format shown in Figure I].1 to provide a 
tradeoff between compression gain and random access within the video stream [6]. A 
GOP includes three picture types; each picture type limits the allowable macroblock 
types. I- and B-pictures are anchor pictures and serve as reference frames. I-pictures 
allow only intracoded macroblocks. P-pictures allow intracoding and forward motion 
prediction from the last anchor picture. B-pictures allow intracoding, bi-directional 
motion prediction, and interpolation and use the last and next anchor frames as 
references. Although not specified by the MPEG standard, an N- picture GOP normally 
starts with an I-picture followed P-pictures every M frames. The remaining frames are 


encoded as B-pictures as shown in Figure III.6. A greater value of N offers greater 
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compression gain at the expense of random access since the decoder must start at an I- 


picture. 
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Figure III.6: Typical GOP, N = 9, M = 3. 


If an error occurs in any anchor picture, errors may propagate through the 
remaining pictures in the GOP until the next I-picture is received. An I-picture decode 
error is the worst case and results in the longest propagation cycle. Since MPEG employs 
motion compensation, decoding errors propagate spatially as well as temporally and have 


been observed to grow and shrink depend.ng on motion within the frame. 
3. H.263 


The ITU standard H.263 defines a low-bit-rate video codec for video transmission 
over the PTSN using V.34 modems. H.263 is optimized for bit-rates of 28.8 kbps and 
less and offers quality superior to MPEG at bit-rates less than 64 kbps. 

H.263 employs the video hierarchy shown in Figure III.1 without the GOP 
structure. H.263 coding resembles the concept of MPEG P-pictures. All] coding 
decisions are made at the macroblock level and each macroblock is either intracoded or 
intercoded using forward motion compensation. To bound error propagation, the 
standard specifies that a macroblock must be intracoded at least once every 132 frames 
[56]. The lack of the equivalent of an I-picture to reset every macroblock at once, while 
deliberate, leaves H.263 vulnerable to prolonged error propagation. Even with the 
mandatory spacing of intracoded blocks, some types of motion lead to almost indefinite 


error propagation [8]. 
4, Error Propagation 


To place error resilience in context, consider the worst-case error propagation 
using M-JPEG, MPEG and H.263 compression under the scenario summarized in Table 


I.1. With M-JPEG, an error in one frame is corrected upon receipt of the next frame. 
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The robust nature of M-JPEG makes it suitable for broadband video conferencing [9]. 
Error propagation in MPEG depends on the GOP size. A typical reported GOP size is 
twenty pictures and, given that an error occurs in the I-picture, the worst-case 
propagation is twenty frames. For an H.263 coded stream, the worst-case error 
propagation depends on how often individual macroblocks are intracoded. The H.263 
standard specifies a maximum limit of 132 frames between updates [56]. Assuming an 
error occurs in an intracoded block and the block is not intracoded again for 132 frames, 
the error could persist as long as 132 frames and possibly even longer given the right 
motion patterns [8]. Table III.1 summarizes the worst-case error duration for each of the 


three codecs for a frame rate of 10 fps. 


Coding Scheme : Worst-case 


error propagation 

(seconds) 

JPES 0.10 

tle Oe 13.20 

MPEG 2.00 


Table III.1: Error Propagation in Popular Video Codecs. 


F. SUBBAND AND WAVELET CODING 


Subband and wavelet coding are additional techniques for compressing still 
images and have been shown to offer slightly better image quality than DCT-based 
schemes for similar levels of compression at the cost of greater computational complexity 
[S50]. Subband and wavelet coding are fundamentally similar in that both decompose the 
image into regions representing different bands of spatial frequencies present in the 
image. Subband coders apply a Series of filters to the image and then decimate the 
resulting bands to avoid oversampling while wavelet coders perform filtering and 
decimation simultaneously [48]. Of the two methods, wavelet techniques are more 


common and are examined further here. 
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In contrast to the DCT, a discrete wavelet transform (DWT) filters and decimates 
an image into regions containing mixtures of the high and low frequency details within 
the image. Decomposition is performed using two analysis filters. The first extracts low- 
frequency content, the signal average, and the other extracts high-frequency content, the 
signal details. Example analysis filters for a four-tap biorthogonal DWT are given by 


[48]: 


H,(z)=-1+3z2' +327 -27 (III-11) 
H (z)=-1+2z' -3z7* 427. (III-12) 
The inverse transform is performed using the following synthesis filters: 
G,(z)= (1+ 3274327 +27 16 (I-13) 
G,(z)= 1-327 +322 +27 ie. (III-14) 


Image compression proceeds as shown in Figure HI.7. A first order 
decomposition creates four 2-D subbands from the original image. Each subband results 
from the appropriate application of the analysis filters in the horizontal and vertical 
directions and decimation by a factor of two. For example, applying Eq. (III.11) in both 
the horizontal and vertical directions generates the LL band. Applying Eq. (III.11) in the 
horizontal direction and (III. 12) in the vertical direction results in the HL subband. The 
remaining subbands are obtained in a similar manner. Each subband captures certain 
image features. The LL subband retains the low-pass information within the image and 
displays a coarse representation of the original image. Since most images have a low- 
pass characteristic, most of the image’s energy 1s found in the LL subband. High- 
frequency information results from edges, which provide visual cues for image 
recognition. The HL and LH subbands contain vertical and horizontal edge information, 


respectively, while the HH subband contains diagonal.edge information. 
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Figure III.7: DWT-based Image Compression. 


The wavelet transform is invertable and lossless and, like the DCT, produces no 
compression gain. The compression gain results from quantization and entropy coding of 
the wavelet coefficients. As with the DCT, the higher frequency coefficients tend to be 
less significant, so most of the compression gain is realized from compacting the detail 
subbands, especially the HH subband. In the layered coder proposed by McCanne and 
Vetterli, the HH subband is discarded entirely [45]. Subbands are usually quantized 
independently. The LL band behaves much like the original image and can be 
compressed using traditional transform-based techniques such as JPEG [57]. The 
remaining subbands are uniformly quantized using a stepsize proportional to the variance 
of the coefficients in that subband [48]. Since the higher subbands tend to have a large 
number of zeros following quantization, run-length encoding and entropy encoding 
significantly increase compression. Zig-zag reordering provides no advantage in the 
upper bands, so RLE occurs scanline fashion, either horizontally or vertically. 
Alternatively, the quantized coefficients are grouped and vector Huffman encoded [58]. 

Greater compression 1s possible by further decomposing the image. Figure III.8 
displays a second-order octave-band decomposition obtained by applying the analysis 
filters to the LL subband as described above. A higher-order decomposition is generated 
by repeatedly decomposing the lowpass subband. The lowpass band is quantized using 
transform-based techniques while the remaining subbands are quantized as described 
above. The increase in the number of bands allows quantization and encoding to be 
further tailored to emphasize perceptual details over less perceptible background noise. 
Alternatively, the interdependencies among the subbands can be exploited using zero-tree 


entropy coding [59]. Zero-tree coding is analogous to zig-zag scanning in DCT-based 
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compression. The tree grows from a single coefficient in each of the low frequency 
bands and gathers coefficients in higher frequency bands that correspond to the same 
spatial location in the original image. Each additional subband increases the size of the 
tree by a power of four. Zero-tree encoding combines elegantly with bit-allocation since 
encoding may stop Once the target bitrate 1s met. Conversely, the decoder may stop once 


a desired level of quality is achieved. 





Figure III.8: Octave-band Decomposition. 


Wavelet-based compression schemes offer some advantages over DCT-based 
schemes. The DCT-based approaches achieve compression gain by removing high- 
frequency content from the image by zeroing the high-frequency coefficients during 
quantization. Wavelet transforms separate the image into regions of high and low 
frequency content, thus allowing more efficient bit allocation since different regions may 
be quantized and coded differently. This 1s advantageous since the DWT coder has the 
option of preserving more or less edge detail to improve perceptual image quality at 
comparable pSNR to the DCT. Another advantage is that wavelet transforms are not 
applied to blocks within the image but are instead applied to the entire image. Therefore, 
at low pSNR, while the DCT demonstrates blocking artifacts wavelet transforms typically 
display a more visually pleasing smoothing effect. In general, wavelet transform coders 


offer compression gains, at comparative pSNR, superior to DCT-based coders. When 


el 


comparing the state-of-the-art coders, wavelet-based coders offer 1 dB improvement in 
pSNR over DCT-based coders [50]. 

Several drawbacks relative to DCT-based compression have limited the utility of 
wavelet-based video compression. Wavelets achieve quality superior to DCT-methods 
by processing the entire image or frame. Motion-compensated video coding exploits 
temporal correlation at the macroblock level. Although the error block could be 
transformed via a DWT, no significant advantage has been determined over the DCT, and 
the computational effort is greater [50]. Many software and hardware “‘fast” 
implementations of the DCT require less than one multiplication per coefficient. Wavelet 


transforms are usually bounded to at least one multiplication per coefficient.® 


This chapter presented the tools requirec for compressing motion video: transform 
methods, quantization, and entropy coding. These tools can be applied to individual 
frames independently as in intraframe coding, or used in conjunction with prediction 
schemes that capture frame-to-frame correlation as in interframe coding. An important 
consideration is that the choice of methods impacts both the complexity and error 
robustness of the coder. Therefore, codec suitability for a particular application 1s to 
some degree dependent on the host networking environment. Wavelet-based coding 
allows flexibility with frequency content selection to improve compression. The 
frequency decomposition offered by DWTs also provides a powerful tool for devising 


more robust schemes for video transmission as detailed in the next chapter. 


* The fast Haar transform is the exception, which requires no multiplication operations [60]. 


IV. LOW-COMPLEXITY LAYERED VIDEO CODING 


Current coding standards, such as H.263 and MPEG, make no explicit allowance 
for network transmission and are severely degraded by both bit errors and packet losses 
[7]. Packet losses are preventable to some extent with proper QoS guarantees, but losses 
due to congestion still occur. Of further concern 1s the fact that tactical wireless links 
exhibit much higher BERs relative to wireline connections. Putting aside the matter of 
BER as outside the control of network appl'cations, most approaches to reducing the 
impact of congestion involve feedback-based rate-control schemes that change the 
coder’s quantization, resolution, or frame rate. As discussed in Chapter Il, RTP provides 
a framework for a multimedia application to gauge the level of congestion within the 
network via receiver reports and vary its target bit rate accordingly. 

A second drawback is the poor flexibility exhibited by traditional video codecs in 
multicast scenarios when video 1s transmitted over heterogeneous, packet-based 
networks. These codecs transmit the video signal as a single stream of packets. The 
combination of a single video stream and a heterogeneous network suffer from many 
limitations [12]. Consider the problem of delivering video to a multicast group consisting 
of several recipients connected over the heterogeneous network shown in Figure IV.1. 
Examining the transmission paths leading from the sender to the different recipients 
reveals an obvious stratification in available bandwidth’. In this scenario, the sender 
faces a dilemma when selecting an appropriate encoder quality. Transmitting high 
quality, high bandwidth video is both acceptable and desirable for some recipients. 
However, low bandwidth recipients will experience high packet loss with a 
commensurate degradation in received video quality. Supporting the lowest common 
denominator forces all recipients to view lower quality video, thereby underutilizing high 


bandwidth links and leaving those recipients dissatisfied. 


” A similar heterogeneity could exist in each user’s processing and display capabilities. 
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Figure [V.1: Video Transmission over a Heterogeneous Network from [45]. 


This chapter addresses these concerns by considering a layered video coder that is 
more suitable for network transmission. The concent of layered coding, especially in the 
context of receiver-based layered multicast (RLM). and previous layered coder proposals 
are examined. The chapter’s primary focus is on anew SNR-scalable layered coding 
scheme appropriate for tactical applications with eraphasis on robust transmission and 
low complexity. Error robustness 1s provided by eschewing motion prediction in favor of 
macroblock updating, which significantly limits the temporal duration of decode errors 
and eliminates any spatial migration. Layering is accomplished via the fast Haar 
transform (FHT) with the exact layering structure tailored to video content. The VTC 
session 1s assumed to consist of both low-motion video, such as a “talking head”’, and 
static displays, such as slide presentations. Handling both types of content with a single 
layering scheme requires unacceptable compromises since the frequency characteristics 
of each are different. Therefore, the coder 1s optimized to handle each type of content 
separately by including separate layering structures and custom VLC tables. Finally, the 
rate control problem is examined, and an approach is proposed to reduce a k-dimensional 


rate-control problem to a simple |-D table lookup. 
A. BACKGROUND 


Several approaches are available to meet the diverse quality expectations in the 


multicast group. The sender could encode the input video as a series of separate streams, 
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where each stream targets a different quality level and target bit rate. Each stream is then 
transmitted to a different multicast group. Recipients then subscribe to that multicast 
group Offering the desired quality and bit rate. A multicast group such as that shown 
Figure IV.1 would potentially require targeting three different bandwidths. However, 
separate encoding presents some liabilities [45]. Transmitting several streams duplicates 
content and requires far more bandwidth. Encoding several streams simultaneously 
requires considerably more computational effort than a single stream and limits this 
approach primarily to non-interactive video-on-demand applications. Another approach 
is to use transcoding at routers wherein a high-quality video stream 1s decoded and then 
encoded to a lower quality for further transmission on a lower bandwidth network [45]. 
However, transcoding requires specialized hardware in the transmission path, and the 
additional delay introduced in reprocessing the video stream makes it less suitable for 
interactive applications. 

As discussed above, feedback messages allow the sender to estimate network 
conditions and adapt to the onset of congestion, thereby reducing the load on the network 
and ensuring that all recipients receive a minimal level of quality. RTP provides a 
mechanism for receiver reports but leaves the actual mechanism for interpreting reports 
and making changes to the application. Other schemes have been developed mainly for 
use over LANs but could be adapted for multicast applications hosted over an ATM 
network. One scheme proposed by Bolot and Turletti [61] employs negative 
acknowledgements to indicate network state when the number of recipients is ten or less 
and uses QoS messages sent periodically with some probability. Sakatani [62] uses 
collisions detected at the MAC level and round-trip delay to measure the effect of 
congestion. Once congestion has occurred, quantization and frame rate are dropped to a 
‘‘slow start” bit rate. If indications of congestion disappear, the original bit rate is 
resumed. Other schemes have been proposed by [63]-[65]. 

However, heterogeneous networks complicate application of feedback-based rate- 
control schemes. In a multicast environment, each recipient in a VTC may observe 


different degrees of congestion. The sender’s task of interpreting the network state and 


1D 


making appropriate changes 1s greatly complicated when sender reports indicate that 
congestion affects only a small subset of the multicast group. Aggressive response 
lowers quality to the entire multicast group while a more conservative response tacitly 
drops some recipients, at least temporarily. Feedback-based control in general is 
problematic. With high-bandwidth networks, rate-control schemes may not respond fast 
enough to be beneficial. In low-bandwidth networks, any feedback scheme consumes 
bandwidth although most attempt some form of conservation. For example, RTP scales 
the receiver report rate to the size of the multicast group. Still, the notion of rate control 
leads back to the issue that selecting a single level of video quality in a heterogeneous 
environment is problematic. 

Layered video coding, especially in the framework of receiver-based layered 
multicast (RLM) [45], provides a solution to the shortcomings outlined above. A 
layered-video coder encodes the video stream as a base layer and a Series of enhancement 
layers, arranged in a hierarchical fashion. The base layer provides a minimum acceptable 
level of quality while the enhancement layers progressively refine the quality of the 
received video sequence. 

Layered video coding with RLM offers greater flexibility in handling the video 
stream by moving bandwidth management from the sender to the network and the 
individual recipients. The sender generates a layered video stream at the highest quality 
(bandwidth) supported by the network to which it is directly attached. Each member of 
the multicast group then subscribes to some or all of the layers. The exact number 
depends on available bandwidth and the video quality desired. If high packet losses are 
experienced, the recipient drops layers until satisfactory reception is obtained. Within the 
network, the video stream traverses a heterogeneous mixture of subnets. Each subnet 
carries the maximum number of layers within the bandwidth available, retaining the most 
perceptually important ]ayers and dropping the rest. Figure [V.2 shows this approach 
using the heterogeneous network portrayed in Figure IV.1. Transmitting the video stream 
as a series of scalable layers maximizes utilization of each link and maximizes the video 


quality available to each recipient. 
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Figure [V.2: Video Transmission Using RLM. 





RLM as originally described by McCannes et al. [45] implicitly provides 
congestion control without feedback via recipient subscriptions. When experiencing 
high-packet loss, recipients have the option of dropping the less important layers. As 
layers are dropped, routers stop forwarding their packets, thus preserving bandwidth for 
more perceptually important layers. This allows more graceful degradation in video 
quality in the presence of both congestion and other changes in network loading. The 
sender does not play an active role in congestion control although receiver reports could 
be used to drop or manipulate the upper layers. RLM can be improved by providing QoS 
guarantees for each layer and exploiting the hierarchical nature of layered video in 
network scheduling decisions. Chapter If discussed methods for multicast transmission 
of layered video with QoS guarantees using ATM; scheduling algorithms for layered 
video are covered in Chapter VI. 

RLM also does not explicitly increase error resilience except each subnet carries 
only those layers capable of being transmitted without excessive packet losses. However, 
research [13] indicates that layered video provides more error resilience than a single 
video stream of similar bandwidth. Spreading errors across multiple layers means that 
fewer errors occur in the base layer relative to a single stream, and errors in the 
enhancement streams are less noticeable. With ATM networking, QoS can be negotiated 


asymmetrically to ensure that fewer errors occur in the most important layers. 
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B. LAYERED VIDEO CODING 


Delivering layered, scalable video involves considerations in addition to those 
covered in the last chapter for traditional coders. The primary concern is effectively 
separating the video stream into hierarchical layers as shown in Figure IV.3. The video 
stream consists of a base layer that offers acceptable quality and a series of enhancement 
layers that progressively improve quality in terms of pSNR, frame rate, or resolution. An 
effective layering scheme creates layers that provide gradual but perceptible increases in 
video quality. Transmitting an additional layer that does not improve quality merely 
wastes bandwidth. An effective layering scheme should also create the layering 
hierarchy without significantly increasing computational expense as compared to 


encoding a single stream and with minimul additional bitstream overhead. 
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Figure IV.3: Overview of Layered Video Coding/Decoding. 


Next, we consider some basic approaches for implementing the layering operation 
implied in Figure [V.3. Two avenues are considered. First, progressive image 
refinement schemes, such progressive JPEG and pyramid coding, easily extend to layered 
coding. Second, as mentioned in Section [II.F, multiresolution techniques employing 
subband/wavelet image coding extend in a natural fashion to layered coding. Each of 
these techniques is explored and illustrated with past and current research on layered 


coder design. 
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ie Progressive JPEG Encoding 


Progressive encoding 1s one of the four encoding modes defined in the JPEG 
standard and represents an extension to the baseline sequential coder presented in Figure 
III.2 [66]. Progressive JPEG prepares the image for encoding in the same manner. The 
image is broken into 8x8 blocks, transformed with the 2-D DCT, and quantized using 
either JPEG standard or customized tables. The difference lies in the manner in which 
the quantized DCT coefficients are encoded. Progressive coders segment the DCT 
coefficients and encode them in multiple passes with each pass containing a subset of the 
frequency content. The goal is to first transmit the most perceptually important 
frequency content and then progressively improve quality with the remaining passes. 
Segmentation is performed via spectral selection or successive approximation. 

From Figure III.3, the DCT coefficients are arranged from low frequency 
components in the upper left corner to high frequency components in the lower right 
corner. Spectral selection segments DCT coefficients into spectral bands for encoding, 
where each band includes a discrete set of spatial frequencies. The first spectral band 
includes the DC coefficient and some number of neighboring AC coefficients. Successive 
bands incorporate higher frequency coefficients until all coefficients have been selected. 
There are various ways to Select the spectral bands. One method Is to treat each diagonal, 
starting with the DC coefficient and working right and down, as a separate spectral band. 
Another method is to group coefficients with similar variances, where each coefficient’ s 
variances is calculated using representative test images [6]. 

Spectral selection tends to produce blocking artifacts when using only a few 
Spectral bands since low frequency content is transmitted first. Successive approximation 
provides more visually pleasing performance by transmitting a portion of all non-zero 
DCT coefficients in each pass [6]. Each coefficient is essentially a binary value and, 
within that binary value, the most perceptible content is carried in the most significant 
bits. Therefore, on the first pass, a specified number of the most significant bits for each 
non-zero coefficient are encoded. On successive passes, the less significant bits are 


encoded. Successive approximation yields a more graceful transition in image quality 


We 


than spectral selection since each pass includes some high frequency content. However, 
successive approximation incurs greater coder complexity compared to spectral selection 
[6]. 

Progressive JPEG may be viewed as providing a “preview” image and then 
successively decreasing the distortion by transmitting additional coefficients. A similar 
approach in layered coding is to transmit a base layer and then an enhancement that 
mitigates errors in the base layers. 

Rhee and Gibson [13] have proposed a two-layer coding scheme targeting ISDN, 
enabling support for one or both B channels dependent on the available capacity (64-128 
kbps). One channel transmits an H.261 encoded base layer while the other channel sends 
an enhancement layer constrained to no more than 64 kbps. As H.26] is similar to the 
H.263 codec described in Section HI.E, only the enhancement layer is covered here. 

After encoding a frame, an H.261 ccder decodes the frame to serve as a local 
reference for motion compensation when encoding the next frame [67]. Rhee and 
Gibson’s proposed coding scheme compares the original frame to the decoded frame and 
determines the MSE introduced by coding for each block. The block errors are sorted 
from highest to lowest, and the B blocks with the highest error are selected for 
enhancement. While the number of blocks selected is fixed (160 in the simulations), the 
location of the blocks varies each frame depending on scene content. After the blocks are 
selected, b bits are allocated to each block such that Bb equals the desired bit rate per 
frame. The bits are allocated to encode the error at each pixel within a selected block 
based on a bit allocation scheme that considers the observed error variance at each pixel 
in test video sequences. Pixels demonstrating larger error variances are allocated a 
greater proportion of the bits; the bit assignment remains constant throughout the video 
session. 

Another proposed layered refinement scheme based on H.261 from Rhee and 
Gibson [68] uses the refinement layer to more accurately describe motion present within 


the frame. H.261 performs motion compensation at the 16x16 macroblock level, which 


sacrifices the more precise motion information available using 8x8 blocks but is faster 


80 


computationally [67]. The enhancement layer considers the displacement of the 
individual blocks comprising a macroblock and yields more accurate motion prediction 
and better visual quality'”. 

The baseline H.261 coder performs macroblock level motion prediction by 
comparing the current 16X16 macroblock to every macroblock in the previous frame and 
selecting the best match. The difference between the macroblocks is quantized, encoded, 
and stored along with the macroblock motion vector. In a parallel operation, block-level 
motion prediction is performed for the four blocks comprising the current macroblock. 
The macroblock motion vector 1s subtracted from each of the individual block motion 
vectors, giving four residual motion vectors. The residual motion vectors are stored 
along with their respective encoded difference blocks in the refinement layer. At the 
decoder, both the baseline H.261] and refinement streams are decoded simultaneously. 
Within the H.261 stream, the macroblock motion vectors and associated difference 
macroblocks are used to update the current frame. If the refinement layer contains 
information for a particular macroblock, the baseline-decoded blocks are replaced with 


updated blocks using the block-level motion vectors. 
Ds Pyramid Coding 


The pyramid coding scheme proposed by Burt and Adelson [69] extends well to a 
layered representation of still images and has been extended into the temporal domain for 
video coding [48]. Pyramid coding employs a simple but effective prediction scheme. 
The image is low-pass filtered, decimated by a factor of two, and then quantized. The 
result is a base image that 1s a coarse representation of the original. Next, the base image 
is interpolated back to the original image’s resolution, filtered, and subtracted from the 
original image to produce a prediction error. If the image has a low frequency 
characteristic, usually a good assumption, the error image is highly correlated and 
compresses very well. The base image is stored or transmitted using lossless 


compression while the error image 1s compressed using a lossy coder. At the decoder, 


'° H.263 offers block-level motion compensation as an option [56]. 


81 


the error image 1s added to an interpolated version of the base image to reconstruct the 
original image. Although pyramid coding is lossy, the error results only from 
quantization of the error image, which may be bounded through proper choice of the 
quantizer. 

The previous description applies to one-step pyramid coding. A multi-step 
pyramid is implemented by successively repeating the filtering and decimation operations 
until the desired size base image 1s produced; each step reduces the size of the image by a 
fourth. For an -step pyramid, the result is a heavily filtered base image and a series of 
n—lerror images. The drawback to a multi-step pyramid is increased computational 
demand as well as increased encoding delay and increased over-sampling of the image. 

The CafeMocha encoder [70] uses pyramid coding to form two layers, and each 
layer is transmitted to a separate multicast group using two RTP sessions. CafeMocha 
transmits video at a resolution of 320x240 with 4 bits/pixel. The base layer uses the 
popular CU-SeeMe video coder [1] at a lower resolution of 160x120, and the 
enhancement layer uses a pyramidal coder to improve the resolution to 320x240. The 
CU-SeeMe coding algorithm uses block replenishment followed by lossless compression. 
A 320x240 frame is first decimated to obtain a 160x120 base frame. Each 8x8 block in 
the base frame 1s then compared to its counterpart in the last base frame and is selected 
for transmission if the difference exceeds a threshold. The selected blocks are losslessly 
compressed and placed into packets of no greater than 1000 bytes to avoid fragmentation 
along the transmission path. 

Instead of forming an error frame, the pyramid coder generates error blocks. 

Each 8x8 block selected for transmission in the base layer is interpolated to give a 16X16 
macroblock. The interpolated macroblock 1s then subtracted from the corresponding 
macroblock in the 320x240 image to form an error macroblock. The difference block is 
losslessly compressed using run-length coding and packetized as above. The results in 
[70] indicate that the addition of a second layer improves visual quality compared to a 


320x240 CU-SeeMe video stream when subjected to a 50% packet loss rate. 
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Gharavi and Partovi have proposed a multi-grade, layered coding scheme that 
combines elements of pyramid and subband coding along with DPCM [71]. Instead of 
providing increasing grades of quality at a fixed resolution, the coder provides scalable 
resolutions and accepts lower image quality at higher resolutions. Three layers are 
employed: a base layer (L1) and two contribution layers (C1 and C2). The different 
resolutions are obtained by combining the appropriate layers prior to the decoder as 


indicated in Table IV.1. 


Quality Resolution Layers 
Grade Required 
Q1 352x240 bi 
Oy 704x480 L1+Cl 
Q3 1408x960 L1+C1+C2 


Table IV.1: Resolutions Supported in Gharavi and Partovi’s Layered Coder. 


Video is captured at the highest resolution (Q3) and low-pass filtered and 
decimated to obtain the next lower grade (Q2), which is in turn low-pass filtered and 
decimated to obtain the lowest quality video (Q1). QI is encoded using a hybrid 
DCT/DPCM scheme compatible with H.261. The Q2 and Q3 video streams are encoded 


separately but in the same manner using hybrid subband/DPCM encoders. 
3. Wavelet and Subband Coding 


Wavelet and subband coding provide a good starting point for designing a layered 
coder since each image or frame 1s resolved into a series of subbands that follow a strict 
hierarchy [48]. As discussed in Section III.D, a two-level wavelet decomposition of an 
image yields an average subband LL, representing the low pass frequency components of 
the image, and the detail subbands LH, HL, and HH, representing higher frequency detail 
in the horizontal, vertical, and diagonal directions, respectively. The following is one of 
several approaches to realize a simple layered coder using a wavelet transform: 

e Compress each frame separately by using the wavelet transform. 


e Quantize and entropy encode each subband separately. 
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e Form three layers based on the frequency content: a base layer (LL subband), 
a first enhancement layer (LH and HL subbands), and a second enhancement 


layer (HH subband). 


A coder employing this approach is shown in Figure IV.4. At the receiver, the 
layers are decoded and inverse wavelet transformed prior to video display. If any layers 
are dropped due to bandwidth (or possibly errors), those wavelet coefficients are assumed 


to be zero and the frame is reconstructed using the remaining detail subbands. 
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Figure IV.4: Basic Layered Video Coder Using Wavelets. 


If more layers are desired, the process can be repeated at the coder by applying 
the wavelet transform to the average (LL) subband to generate four higher order 
subbands. Following the approach outlined above, the compressed video could be 
transmitted using as many as Seven distinct layers. 

Bahl and Hsu have proposed a wavelet-based layered coder incorporating content 
sensitive spatial decomposition and multiresolution coding [72]. Spatial decomposition 
is performed via a split-and-merge algorithm [73]. A frame is split into blocks of 
identical size and then adjacent blocks of similar variance are merged to generate regions 
of common perceptual importance. After applying the algorithm, the results are saved as 
a segmentation mask and reused for subsequent frames. A new segmentation mask is 
only calculated if significant motion occurs within the frame. 

The coder decomposes each block using the fast Haar transform (FHT) and then 
applies motion compensation, quantization, and variable-length coding to each subband. 


Bit allocation 1s performed in proportion to the variance exhibited within each subband. 
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Transmission is prioritized by subband and region and, optionally, the receiver can 
request priority updates for regions corrupted by packet loss within the network. 

McCannes et al. have performed the most extensive work on the problem of 
multi-cast video by proposing the RLM architecture for delivering multi-cast video over 
heterogeneous networks [12]. In a follow-on work, the authors break the multicast video 
problem into two areas, the compression problem and the transport problem, and propose 
a comprehensive solution for both problems [45]. The compression problem is met with 
their proposed hybrid DCT/wavelet layered codec. The codec provides robust error 
resilience, low coder complexity for good run-time performance, and acceptable 
compression performance. 

Error resilience is provided through macroblock-based conditional replenishment 
wherein only the macroblocks that change in the current frame are encoded for 
transmission. While block replenishment does not offer the same compression gain 
available with motion compensation, the authors argue that the difference is negligible 
compared to improved quality when considering packet loss. 

After blocks are selected for replenishment, they are compressed spatially using a 
hybrid DCT/wavelet scheme. Each 16x16 macroblock is decomposed into four 
subbands. The LL band 1s created using a 1/3/3/1 biorthogonal wavelet, and the 
remaining subbands are created using the discrete Haar transform [48]. The HH band 
contributes little energy to the reconstructed frame and is discarded. The LL block is 
further transformed with a DCT and the resulting coefficients are progressively encoded 
using spectral selection. The remaining LH/HL subbands are combined and are also 
progressively encoded using embedded zero-trees. 

Once all selected blocks within the current frame are encoded, a spatio-temporal 
hierarchy is created combining spatial and temporal layering. Within each encoded 
block, the progressively encoded DCT and wavelet coefficients are organized into a 
number of spatial layers. The possible combinations of bit-rate between spatial and 
temporal layers 1s a two-dimensional region where every trajectory provides a 


compromise between visual quality and the rate of frame updates at increasing bit rates. 
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©: A LOW-COMPLEXITY ADAPTIVE LAYERED CODER DESIGN 


In this section, we propose a new layered coder design. The goals in proposing a 
new coder are threefold. First, tactical considerations limit transmission bandwidth and 
place a premium on robust transmission. These considerations determine the type of 
compression techniques that are desirable or even feasible in a tactical video coder. 
Second, previously reported layered coding efforts are very diverse with emphasis on 
different network architectures or applications. Consensus on identifying a structured 
approach to designing layered coders or quantifying those parameters that make a layered 
coder effective is lacking. Third, a working coder provides a source for gathering 
statistical traffic data that is used in later chapters to model layered video traffic for 
network simulations and to examine error concealment issues. A working 
implementation of this coder is provided by [74] and was used to evolve the design. 

The guidelines observed in designing the layered coder flow from both the tactical 
VTC application and the considerations for designing an effective layered coder. The 
application imposes the following requirements. First, the coder must adaptively 
optimize compression for both low motion video and static slides. Second, the coder 
must possess a low complexity architecture to minimize coding delays and power 
requirements. Third, the coder must provide error resilient decoding at high packet loss 
rates. Fourth, the coder must constrain the bit rate to a predetermined average. Finally, 
the coder must meet the performance specifications listed in Table I.1. 

Implementing an effective coder within the above constraints involves due 
consideration of the following elements. First, the coder should transmit a base layer 
with acceptable quality and two (or more) enhancement layers such that each 
progressively improves perceptual quality. Second, the coder should minimize the 
bitstream overhead required to accommodate the layering structure. 

A functional diagram of the proposed coder is shown in Figure IV.5. Details for 


each component are provided in subsequent sections. 
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Figure IV.5: Functional Block Diagram of the Hybrid FHT/DCT Layered Coder. 


1. Block Selection for Motion Compensation 


Given the assumption of low activity video, temporal compression is provided 
through a simple block selection (updating) scheme that encodes only those macroblocks 
that show significant changes frame-to-frame. For low activity video, block selection 
yields only slightly inferior compression performance relative to motion prediction 
schemes [53]. Since interframe error propagation is greatly limited and intraframe error 
propagation is eliminated, it provides greater robustness. Block updating also voids the 
need for a locally decoded reference frame. This greatly simplifies the coder since an 
inverse quantization/transform loop is not required. Block selection is considered here 
solely with regard to video sequences. Static sequences exhibit little or no motion and 
consequently make little use of block selection. Indeed, most transmissions that occur 
during static sequences arise from the considerations presented in the next section that 
require the inclusion of a block-aging algorithm. 

Motion is detected by applying a distance metric between successive frames. The 
distance between each macroblock in the current frame and its counterpart in the previous 


frame is calculated and the result compared to a threshold. To decrease computational 
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expense, the distance metric 1s applied to individual 8x8 blocks within the macroblock; 
the first block to satisfy the threshold triggers selection and ends the search, thus avoiding 
the expense of examining the remaining blocks. To further decrease computational 
expense, distance calculations are confined only to the luminous component of each pixel 
even if color components are present since the human visual acuity is more Sensitive to 
changes in luminosity [6]. 

Since motion in VTC scenes tends to be confined to discrete objects within the 
scene, aS opposed to scene motion caused by a camera pan, search efficiency is slightly 
affected by the order in which the individual blccks are examined. The more efficient 
approach is to maximize the distance between tne first two blocks examined. As shown 
in Figure IV.6, two search patterns can be considered: a cross-pattern search that 
examines the upper left block followed by the lower right and a clockwise search starting 
from the upper left. In the test video sequences examined, for those macroblocks selected 
due to motion, the cross-pattern search resulted in a 2.5% decrease in the average number 
of blocks examined per frame compared to the clockwise search. The result was a net 
decrease of one block per frame. Of course, the decrease depends on motion content; 


with increasing motion, the difference becomes negligible. 





Figure IV.6: Block Search Order: a) Clockwise Search and b) Cross-pattern Search. 


A much greater improvement is realized by using the cross-pattern search but 
changing the starting block of each macroblock each frame to match the anticipated 
motion at that point in the frame. Again, motion in VTC sequences Is fairly confined. 
For example, a speaker shifts left to nght and/or slightly up and down. Consequently, 
macroblocks tend to be selected in the same manner. Therefore, search speed 1s 


increased by having the coder store the identity of the specific block, termed the “anchor” 
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block, that caused a particular macroblock to be selected in the previous frames. For 
each macroblock in the new frame, the block selection algorithm starts from the anchor 
block. If the anchor block causes selection or 1f the macroblock is not selected, the 
anchor block identity is unchanged. If another block causes selection, the anchor block 
identity is updated. Using this search scheme produced an additional 20% improvement 
in the number of blocks searched and resulted in 10 fewer blocks searched per frame on 
average. A more complex approach not examined here 1s to remember the two blocks 
that most frequently caused selection and tailor the search accordingly. The resulting 
tailored search would be clockwise, counter-clockwise, or cross-pattern. 

The distance metric employed 1s the non-normalized ASD given by [6]: 


M WN 
> aaa | (IV.1) 
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where x, ,, and ce represent the pixel intensities in the predicted and reference blocks, 


respectively. This expression of ASD differs from the form given by Eq. (III.10) in that 
the result is not normalized by the number of pixels and the reference macroblock is not 
offset. The non-normalized version 1s used since the normalization factor 1s easily 
included in the threshold value, saving the cost of a floating point division operation or, 
at least, a right-shift operation. The ASD is employed due to computational efficiency as 
it only requires additions and subtractions along with a single absolute value operation. 
SAD requires an equal number of arithmetic operations but requires MN —1 more 
absolute value operations. Further, since the ASD takes the absolute value of only the 
sum, it acts like an accumulator and provides a low-pass filtering effect that removes 
noise added to pixel intensities through video capture. Smoothing prevents spurious 
block selection in otherwise static screen regions that could occur in other metrics, such 
as SAD or MSE, where non-linear operations on a per-pixel basis tend to accumulate 
noise energy. This allows bandwidth to be more effectively devoted to regions of greater 
interest [45]. 

The relative selectivity of ASD and SAD was tested by determining the relative 


thresholds required to deliver approximately the same quality, as measured by pSNR, and 
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then comparing the resulting block selection rates and pattern. Examining Figure IV.7, a 
threshold index of below 8-10 was required to adequately capture motion scene motion. 
In this region, ASD selects 1-2 more macroblocks compared to SAD. However, 
examining the macroblocks selected confirmed that ASD tended to better capture speaker 
motion while SAD’s selections were more diffuse. As a result, not withstanding the 
pSNR equivalence, video compressed using ASD was judged more visually pleasing. 
The difference in bandwidth appears negligible considering the vast decrease in 


computational effort required by ASD. 


AvgMB by ASD 
-- Avg pSNR by ASD | 

—  AvmbbySAD | 

— - Avg pSNR by SAD 


Avg pSNR (dB) or Avg MB 





2 4 6 8 10 he 14 16 18 20 
step index 


Figure IV.7: Comparison of ASD and SAD for Block Selection. 


Two independent elements effect video quality and thus required bit rate: 
adequate motion detection to prevent “jerky” motion in the reconstructed video and 


controlling distortion introduced due to quantization. The goal in motion detection is to 
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select the maximum block selection threshold that adequately captures motion. In the 
video sequences examined, a threshold of 160 proved adequate. At this threshold value, 
an average of 24.8 macroblocks was selected per frame in test video sequences’. In 
practice, a user selectable threshold would prove beneficial by allowing the sender to 


compromise between motion selection and visual distortion given a set bit rate. 
Ds Aging Algorithm 


Motion compensation using only block refreshment through the selection scheme 
described above presents some problems [45]. Consider an arbitrary macroblock whose 
content is changing due to motion within the frame. The macroblock travels from its 
initial state along some trajectory to a final state once the motion has stopped. At some 
point in the trajectory, the block selection algorithm forces an update to the macroblock. 
Once the final state 1s reached, hysteresis occurs if the distance between the final and 
updated states 1s not sufficient to force block selection; the distance differs by less than 
the threshold. In this case, the macroblock is not selected for updating, and the displayed 
macroblock at the receiver is left with a persistent error. Another problem occurs when 
new participants are allowed to join a VTC in progress (dynamic multicast) [45]. Since 
the coder is only transmitting those macroblocks selected due to motion, new participants 
receive a portion of the current scene. With low activity video, the end result is a patchy 
“disembodied” speaker. The final problem is the duration of error artifacts due to 
missing or corrupt packets at the receiver. Artifacts created in the active portion of the 
scene tend to last for only a single frame since block updates occur frequently. However, 
errors in less dynamic regions tend to persist longer since the frequency of updates is 
correspondingly lower. Due to lower motion content, each of these problems is of greater 
concern during static sequences since the block updating scheme selects either a few 
macroblocks to transmit, given an in-screen cursor, or none at all. | 

Coupling the block update scheme with an aging scheme that forces periodic 


updates of each macroblock alleviates these problems. The general principle is that the 


'' Actually more macroblocks are selected due to forced selections as covered in the next section. 
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coder tracks the time interval or age since each macroblock was last selected. If a 
macroblock’s age exceeds a predetermined interval, that macroblock is selected by 
default. Aging thus guarantees a maximum period between macroblock updates. This 
bounds both the duration of hysteresis errors and visual artifacts caused by losses and 
errors during transmission. The bound also ensures that new viewers receive an entire 
frame in a timely manner. 

Obviously, aging increases bandwidth requirements, but the impact is lessened by 
the manner in which macroblocks are s2lected through aging and the length of the aging 
interval. Spreading block selections evenly over time is desirable to avoid spikes in bit 
rate, which in turn requires a scheme that ages each block independently. Simply 
choosing to update a block after n frames pass without an update leads to an undesirable 
correlation in updates following each scene change. Even though motion within the 
scene tends to randomize updates to some extent, a sufficiently static background would 
still lead to correlation of a significant fraction of block updates. The worst case is 
represented by a scene change where the new Scene Is entirely static, such as a slide 
presentation. In this case, the bitrate spikes every n frames. Increasing the aging interval 
decreases bandwidth but increases the duration of visual errors and degrades response 
time for new participants. 

The aging algorithm used in the coder does not track the age of each macroblock 
directly. Instead, each macroblock has an entry in an update table identifying the number 
of frames remaining until that macroblock must be updated. As each frame passes 
without an update, the entry is decremented by one. As each macroblock is processed for 
block selection in a given frame, the coder examines the macroblock’s entry in the update 
table. If its corresponding entry has reached zero, the macroblock is selected for 
transmission. Otherwise, the distance metric 1s applied to determine if the macroblock 
should be selected due to motion. The order of the two events is important. Since the 
distance metric does not need to be calculated for those macroblocks selected due to 
aging, the result is a net decrease in the number of calculations required to select 


macroblocks for transmission. In either case, after a macroblock has been transmitted, a 


new update is scheduled m frames in the future, where m 1s a discrete uniform random 
variable distributed in the range [1,77]. Pseudocode for this algorithm is listed in Figure 
IV.8. The update interval is initialized to 0 at the start of coding in recognition that all 


macroblocks in the first frame must be coded. 


initialize update_table[99]} to 0; 


for each frame k 


% Process each macroblock in frame 
for each MB 3 = 1 to 99 


% Count down to next fcrced update 
update_table[j} -= 1 


% Check for forced update 
if update_table[j] = 0 
encode block 
update_table[j] = random update 


% Check for block selection 
else if distance(MB 3) > threshold 
encode block 
update_table[j3] = random update 
end 


end 





Figure IV.8: Pseudocode for Aging Algorithm. 


Using a uniform distribution to schedule updates smoothes block selections over n 
frames and decorrelates the selection of individual macroblocks through aging. Choosing 
aging intervals randomly also prevents events, such as scene changes, from correlating 
updates and generating spikes in bit rate. The value chosen for n controls the tradeoff 
between additional bandwidth required and coder responsiveness. For a given value of n, 
the number of additional macroblocks selected through aging per frame N, is 
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where Nag is the number of macroblocks in the frame. Actually the bandwidth impact is 
lower since some of the blocks selected via aging would have been selected anyway due 
to scene motion. 

For the video sequences examined in this work, 7 was set to 20. This value offers 
an acceptable compromise between bandwidth, corresponding to an additional 9.43 
macroblocks per frame, and responsiveness. New VTC participants are guaranteed to 
receive a complete frame after 2 seconds, at 10 fps, and visual errors are bounded by the 


Same value. 
5: Layering Strategy 


Macroblocks selected for transmission are decomposed into layers using a 
wavelet transform. Since the selection process takes place before the transform stage, the 
transform is only applied to those macroblocks requiring transmission. A wavelet-based 
approach was chosen since frequency decomposition offers the most flexibility in 
populating layers. A macroblock may resonably be decomposed into as many as sixteen 
2x2 subbands, using a uniform decomposition, which then may be combined in various 
manners to create an arbitrary number of layers (up to sixteen). The challenge is in 
determing an appropriate number of layers and apportionment of the frequency content 
within the macroblock across those layers. 

As layers are hierarchical in importance, layer assignments should map frequency 
content to that hierarchy in a manner consistent with perceptual importance. Just as 
important, the bit rate allocation resulting from the layer assignments should be 
segragated such that dropping a layer offers the potential for decreasing congestion. In 
practice, meeting these expectations with a single layering scheme proved impractical. 
Therefore, two specific layering schemes were required: one for video sequences and one 
for static presentation slide sequences. 

For both types of sequences, layering 1s accomplished through application of the 
fast Haar transform (FHT) to each selected macroblock. The FHT is the simplest possible 
wavelet algorithm [60] and is described by 


$ 
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x (n)= = (xs (2n)+ xé (Qn+ 1)) (iy...) 


x‘ (n)= > (x2 (2n)- x4 (2n+ 1)) (IV.4) 


where x, is the original data vector, and vectors x; and xi’ are the average and detail 


decomposition vectors, respectively. The FHT has several desirable properties with 
regard to minimizing coder complexity. First, the FHT is a real transform, so no complex 
arithmetic is required and storage is simplified. Second, the FHT is not computationally 
demanding as its application requires only addition, substraction, and left- and right- 
shifts. Finally, unlike more sophisticated wavelet transforms, the FHT does not require 
extending or padding the data set. However, the simplicity of the FHT can lead to 
blocking artifacts at high compression levels since the average and detail calculations are 
confined only to contiguous pixels. 

Since video information is two-dimensional, Eq. (1V.3) and Eq. (IV.4) can be 
applied to each dimension idependently, resulting in four uniform subbands as discussed 
in Section III.F. A key difference from that discussion is that the average and detail 
equations are applied to individual macroblocks instead of the entire frame. The resulting 
average (LL) subband and the three detail subbands (HL, LH, and HH) are each 8x8 in 
size. The actual operations required to generate each subband and the physical 


signficance of each subband are given in Table IV.2. 


Subband Detail Horizontal Vertical 
Operation Operation 
LL Lowpass Average Average 
LH Horizontal Average Detail 
HL Vertical | Detail Average 
HH Diagonal Detail Detail 


Table [V.2: Significance and Determination of Wavelet Subbands. 


The coder restricts the number of layers to three. The decision to consider no 


more than three layers was driven by the limited bandwidth available. Each layer 


2 


consumes an equal amount of bandwidth in overhead. While a greater number of layers 
offers more flexibility in managing quality and congestion, at 64-96 kbps, three layers 
appears to be the limit in terms of producing layers that provide a perceptible 
improvement in quality. 

The initia] layering strategy considered for both the video and the static slide 
sequences performs only a first order analysis of each selected macroblock, generating 
the subbands listed in Table IV.2. Each subband generated 1s assigned to a layer as 
shown in Table IV.3. The layer assignments are intended to promote a graceful increase 
in quality by progressively adding frequency content. The base layer is essentially a 
lowpass-filtered version of the original macroblock, and the two enhancement layers 
successively add in higher frequency details. Since the LL subband retains many of the 
perceptual properties of the original macroblock, the LL subband is transformed further 
using the 2-D DCT. The additional transform allows the LL subband to be processed 
using JPEG, an approach that exploits that standard’s emphasis on maximizing retention 


of the most perceptually relevant information. 


Layer Subband(s) Included 


Base Pre 
1“ Enhancement LH, HL 
2™¢ Enhancement HH 


Table [V.3: Preliminary Layer Assignments 


Preliminary results for the initial layering approach were disappointing. With 
regard to video sequences, the base layer gives acceptable quality and the first 
enhancement layer produced a marked improvement in quality. However, the bit rate 
allocated to the second enhancement layer by this assignment scheme was small (< 10%), 
and application of the layer only occasionally produced a perceptible improvement in 
quality. For static slide sequences, the situation 1s reversed. Slides consist of text and 
line drawings, which exhibit a different frequency characteristic than motion video. The 


preponderance of sharp edges, in all directions, increases the relative importance of 
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higher frequency content in these frames relative to motion video frames. As a result, the 
hierarchy given in Table IV.3 is reasonable for motion video but unsuited for static 
sequences. Due to the absence of high frequency content, text and block diagrams were 
blurry and indistinct. Even adding the first enhancement layer only yielded a marginal 
improvement. Indeed, only the final addition of diagonal detail produced acceptable 
quality. 

The results indicate that a frequency-based hierarchical scheme designed for 
motion video is unsuitable for static sequences. Although examined further below, the 
converse also appears to be true. Thevefore, separate layering schemes were formulated 
for each sequence. The coder deduces the type of sequence present and applies the 
appropriate layering scheme. 

The ad hoc approach presented above indicates the need for a more general 
technique for determining an appropr-ate layering structure for a video stream. The 
problem is to determine, given that layers are desired, to what degree a selected 
macroblock is decomposed and how the resulting subbands are allocated to each layer. 
Here, we propose a variant of the split-and-merge algorithm [73] applied at the 
macroblock level. Instead of applying the algorithm in the spatial domain to identify 
regions of equivalent activity, the algorithm is applied to selected macroblocks in the 
frequency domain to identify regions of similar energy and perceptual content. 
Essentially, the macroblock is split into equal segments using the FHT, subbands of 
approximately equa] variance are grouped, and the resulting regions are allocated to 
individual layers. At this point, dynamically changing the layering structure is not 
permitted. 

Given a representative video sequence, the first step of the algorithm is to split 
each macroblock using the FHT. The macroblock is split into equal subbands by 
recursively applying the FHT to each subband until the desired number of subbands is 
created. For example, a first order decomposition of the macroblock creates four 8x8 
subbands (LL, LH, HL, HH). A second order decomposition of each of these subbands 


creates sixteen 4x4 subbands as shown in Figure IV.9. Continuing the example, a second 
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order decomposition of the LL subband results in the LLLL, LLLH, LLHL, and LLHH 
subbands. Likewise, a third order decomposition produces 64 2x2 subdands. In practical 
application, stopping at a second order decomposition proved sufficient for three layers. 

Using the representative video sequences, the variance of the coefficient set 
comprising each subband is determined across all frames of video. Using subband 
variance as a metric to form layers offers two benefits. First, with motion video, variance 
appears to have an inverse relationship to spatial f:equency and thus perceptual 
importance. Therefore, differences in variance provide a convenient mechanism for 
assigning subbands to a layered hierarchy. Second, grouping subbands with a similar 
variance is convenient since each group can employ a common quantizer. Several 
quantizer schemes allocate bits by varying quantizer step size in inverse proportion to 
variance. This approach uses variance as indication of the dynamic range exhibited by 
the coefficients. One such scheme, described later, apportions bits in an attempt to 


balance distortion introduced across each subband [48]. 


8x8 


Subband 
FHT 


Macroblock 





Figure IV.9: Splitting a Macroblock into Uniform Subbands. 


The subband variances, computed using several test video sequences, after first 
order decomposition, are shown in Table [V.4. The subband variances after a second 
order decomposition are shown in Table 1V.5. Subband variance provides a good 
indication of energy concentration within each subband. Since the video images are 
lowpass, the energy 1S concentrated in the lowest subband as shown in Table IV.4. By 
extension, the subband variance also provides an indication of relative perceptual 


importance, an observation that allows subband variance to dictate layer assignments. A 
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second order decomposition further differentiates the frequency content found in the first 


order subbands. For example, after a second level decomposition of the LH subband, 


energy is now concentrated in the LHLL and LHLH subbands. Values in Table IV.5 


resemble the transpose of Figure III.4 and demonstrate that subband variance strongly 


tracks the visual components in the macroblock. This strengthens the argument for using 


subband variances to make layer assignments in a hierarchical manner for video 


sequences. 
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Table [V.4: Subband Variances after a First Order Decomposition (Motion Video). 
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Table IV.5: Subband Variances after 2 Second Order Decomposition (Motion 


. Video). 


After variance data has been gathered for each subband at the desired level of 


analysis, the next step is to group adjacent subbands exhibiting similar variances. The 


criterion suggested by [73] 1s to group adjacent subbands k, and k> with variances O;, 


2 . 
and oO, , respectively, when 
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and a... and o-,, represent the maximum and minimum variances found among the 
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subbands. The parameter N, represents the total number of subbands. 

Grouping of subbands based on the variances in Table IV.5 results in the 
partitions shown in Figure [V.10. Assuming that each subband is independent, the 
variance of each partition P; 1s simply the sum of the variances for the subbands k; 


comprising that subband: 


= So; . (IV.7) 


k,eP, 
Since the subbands comprising each partition have similar variances, each partition can 
be quantized using the same scheme such tnat quantization errors are spread uniformly 


among the subordinate subbands. 
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Figure IV.10: Partitions Resulting from Merge Algorithm. 






Next, the resulting partitions are assigned to layers L; until the requisite number of 


layers are created using the following set of heuristic rules: 


Rule 1: No layer may have a greater variance than any lower layer. That is, given 
N layers, 


On On ae Oren (IV.8) 
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Rule 2: Layers must be populated in order of increasing frequency content. A 
layer may not contain a partition of lower frequency content than any layer below 


it. 


Rule 3: Partitions that meet the criterion given by Eq. (1V.7) are assigned to the 


same layer even if the partitions are non-contiguous. 


Rule 4: Partitions are applied to layers in a Symmetric fashion. 


Rule 5: If more than two subbands comprising a coarser subband remain as 
partitions after applying the above rules, group all of the partitions comprising the 


coarser subband together into one partition. 


Rule 6: If one or more partitions are moved between layers, as required to 
achieve a more balanced distribution of bit rates or quality, move the partition(s) 
with the lowest variance 1f promoting to a higher layer and the partition(s) with 


highest variance 1f demoting to a lower layer. 


The reasoning behind these rules stem from the requirements stated for layered 


coder design at the start of this section. Rule | ensures that no upper layer receives a 


greater bit allocation than the lower layers. This provides a more logical sequence to the 


layer hierarchy since the lower layers will make a greater contribution to reconstructed 


quality, and quality loss due to layer dropping 1s more gradual. Rule 2 matches the layer 


hierarchy to the observed frequency dependence displayed by the human visual system 


(HVS) and ensures a more graceful degradation in quality during periods of congestion. 


Rule 3 simplfies quantizer design by allowing non-contiguous partitions to use the same 


quantization scheme. Rule 4 ensures that neither horizontal or vertical detail dominate a 


partially reconstructed frame. A lack of balance between these components distorts the 


image and causes scene elements to appear elongated. Simplifying coder design and 
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minimizing processing delay are the main considerations for Rule 5. Each distinct 
partition or subband transmitted requires overhead within the bit stream for the decoder 
to correctly position the contributions. A greater number of subbands also complicates 
quantizer design and rate control. Concatenating the single subband partitions into their 
coarser, parent subband offsets these concerns and reduces the computational burden 
required to transform the macroblock since an analysis step 1s dropped. 

Rules 1-5 help determine an effective layering scheme for motion video. 
However, implementation provides the final test of the effectiveness. Two problems may 
result during implementation as discovered ir. the first ad hoc approach attempted. The 
resulting bit rate for a layer may be small such that bitstream overhead is too high. Or a 
layer may appear to offer a negligible impact of reconstructed quality. In either case, the 
solution is to reduce the number of layers by concatenating the ineffectual layer with an 
adjacent layer or to move partitions between layers. The latter situation is covered under 
Rule 6, which provides guidance for moving partitions between layers without violating 
the other rules. 

Application of these rules to the partitions shown in Figure IV.10 resulted in the 
final layering scheme for motion video sequences shown in Figure IV.11. The LL 
subband is assigned to layer I and further transformed via DCT as previously discussed. 
The HH subband is assigned to layer III in its entirety. The HL and LH subbands are 
further decomposed. The resulting subbands are partitioned and assigned to layers II and 
III. The layer assignments in Figure [V.11 also provide the basis for the quanization 


scheme discussed in the next section. 
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Figure IV.11: Final Layering Scheme for Motion Video Sequences. 
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The generalized layering scheme presented above 1s biased for motion video 
sequences. Consequently, the layering scheme presented in Figure [V.11 1s not suitable 
for static slide sequences. Static slide sequences show a much greater dependence on 
higher frequency components for perceptual recog 1ition since text and line drawings 
have a much higher preponderance of edge detail. Any hierarchical scheme based on the 
lowpass nature exhibited by images yields a blurred reproduction with only the lower 
layers and gives satisfactory results only when the high frequency layers are added. For 
example, applying the motion video layering scheme to slides containing text and line 
drawings only gives acceptable results when all three layers are received. Obviously, this 
defeats the purpose of layering video. Therefore, a different layering scheme is 
appropriate if the video stream 1s to include both types of sequences. 

Although the general layering scheme presented above is not applicable to static 
slide sequences, application of the split-and-merge algorithm is still meaningful. The 
variances exhibited by the subbands generated after a first and second level analysis of 
slide sequences consisting of text and line drawings is shown in Table IV.6 and Table 
IV.7, respectively. Comparing these values to those for the motion video sequences 
given earlier, it is evident that energy 1s much more evenly distributed among the 
different subbands. The result promotes a much more complex relationship between 
variance and perceptual importance which 1s demonstrated in the close interdependence 


between the various subbands. 
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Table IV.6: Subband Variances after a First Level Decomposition (Slide Sequence). 
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Table IV.7: Subband Variances after a Second Level Decomposition (Slide 


Sequence). 


Applying the split-and-merge algorithm results in the partitions shown in Figure 
IV.12. Using the layer assignment rales outlined above, partitions P;, P2, and P4 are 
assigned to the base layer. However, reconstruction based solely on the base layer gives 
very poor results. Even adding partitions P3 Ps, and P¢ fails to achieve acceptable results 
even though such an arrangement includes a large portion of the energy contained in the 
macroblock. Therefore, unlike in the motion video case, variance alone provides a very 
poor guide to determining perceptual relevance. Instead, achieving acceptable 
reconstruction starting with the base layer requires contributions from each of the 8x8 
subbands. In practice, the layering scheme shown in Figure IV.13 was found to be 
suitable. The base layer consists of those 4x4 subbands containing the most significant 
details as determined by variance. Although in motion sequences the LLLL subband is 
expected to have a lowpass frequency characteristic consistent with the original 
macroblock, this does not hold true with the static sequences. Therefore, application of 
the DCT provides no additional benefit. The remaining subbands are divided between 


the remaining layers in order of increasing frequency content. 
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Figure IV.13: Final Layering Scheme for Static Slide Sequences. 





! 


Although the partitions in Figure [V.12 do not directly lead to a satisfactory 
layering arrangement, continuing the examination does lead to a simple quantization 
scheme. After merging partitions with similar variances, the partitions have been reduced 
to those shown in Figure [V.14. Although partitions P2 and P3 are not close enough for 
merging, given Eq. (IV.5), they are sufficiently close in variance such that the simplicity 
gained by quantizing both bands together balances any possible sub-optimal bit 
allocation. The final partitions, for the purpose of quantization, are shown in Figure 


LV As: 
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Figure IV.14: Partitions Remaining After Merging Similar Non-Contiguous 







Partitions. 
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Figure IV.15: Partitions for the Purpose of Quantization. 










Since two different layering schemes are used, the coder requires some criteria for 


determining the type of video is present. The determination 1s made following each 


scene change. The coder judges that a scene change has occurred if the number of 


macroblocks selected exceeds some threshold. After examining the block selection 


Statistics for motion video, selecting a threshold three standard deviations above the mean 


block selection rate was high enough to avoid spurious scene change detections. Ifa 


scene change has occurred, the coder examines the number of macroblocks selected due 


to motion in the next frame. If the value is zero, the current sequence 1s assumed to be 


Static since obviously no motion has occurred within the scene. Otherwise, the sequence 


is assumed to be a motion video sequence. 
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4. Quantization and Lossless Coding 


After the transform stage, individual subbands are quantized and losslessly coded 
according to their layer assignment (motion video sequences) or partition assignment 
(static sequences). The main difference is that the base layer for motion video sequences 
is encoded using the JPEG standard. Otherwise, uniform quantization is used with a 
single step size for each layer/partition followed by Huffman coding. 

The quantization and coding stage for motion video macroblocks is shown in 
Figure IV.16. The LL subband coefficients are quantized and encoded using the 
luminance quantization array and luminance VLC table suggested in [75]. This process 
is summarized in Chapter III. 

The remaining subbands are uniformly qu«ntized using a fixed quantizer step size 
for all coefficients in that subband. The value of the quantizer step size is set 
independently for Q1 and Q2, and all subbands entering a particular quantizer use a 


common step size. 


JPEG JPEG 
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Motion 


VEC 


HLLH, HLHH, 
LHHL, LHHH, HH Q2 Layer III 


“ 


Figure [V.16: Quantization and Coding for Motion Video Macroblocks. 


Unlike in JPEG encoding, zig-zag scanning of the quantized FHT coefficients 
provides no apparent coding gain. Instead, trials indicated that a simpler horizontal raster 
scan was adequate for all bands except the HL subband. The HL subband showed a 
slight preference for a vertical raster scan, which seems consistent given the frequency 


orientation of this band. The scan orders are summarized in Table [V.8, where the scan 
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order applies to the subband indicated as well as all child subbands. The LL entry 


pertains only to coding of static macroblocks and is included for completeness. 


Parent Subband Scan Order 


LL Raster 
LH Raster 
HL Vertical Raster 
HH Raster 


Table IV.8: Scan Order for Encoding Quantized Coefficients. 


After scanning, each non-zero coefficient is losslessly coded using a Huffman 
VLC code. The coding scheme chosen mirrcrs the 3-D event structure employed by the 
H.263 coding standard. Each non-zero coefficicnt is replaced by an equivalent event 
described by three parameters [56]: {LAST, KUN, LEVEL} where LAST indicates 
whether there are any more non-zero coefficients in the current subband; RUN indicates 
the number of successive zeros that precede the non-zero coefficient; and LEVEL 
represents the non-zero magnitude of the quantized coefficient. Each event maps to a 
VLC codeword to which a sign bit is appended to represent the sign of the coefficient. A 
VLC table was derived for motion sequences using a Series of representative test 
sequences [74]. 

The quantization and coding stage for static macroblocks is shown in Figure 
IV.17. The major difference compared to motion macroblocks is that JPEG is not 
employed. Instead, the sixteen subbands are supplied to one of three independent 
uniform quantizers, Q1, Q2, and Q3, each with a fixed quantizer step size. After 
quantization, each non-zero coefficient is replaced by a 3-D VLC codeword as described 
above although a different VLC table is employed. Again, the VLC table was developed 
from a series of representative sequences [74]. 

Neither Figure IV.16 nor Figure IV.17 indicates the presence of the control signal 


from the Contro] Unit shown in Figure IV.5. The control signal allows manipulation of 
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the quantizer step sizes, or a scaling factor in the case of the JPEG quantizer, as required 


by arate control scheme. Rate control schemes are covered later in this chapter. 


Static 
LLLH, LHLL, LHLH, 
LLHL, HLLL, HLHL VLC Layer II 
LLHH, HLLH, HLHH., 
LHHL, LHHH, HHLL, Q3 Layer Il 
HHLH, HHHL, HHHH : 


Figure [V.17: Quantizatio:: and Coding for Static Macroblocks. 


D. RESULTS 


This section includes some example video traces for a short video segment 
consisting of 100 frames of a single speaker followed by 50 frames of a presentation slide 
filled with line diagrams and text. A sample frame from each sequence is shown in 
Figure [V.18 and Figure IV.19. Each shows the original frame and the reconstructed 
frame with only the base layer received, the base layer and the first enhancement layer 
received, and all layers received. With the exception of scene changes, the coder 
employed no rate control for these sequences; a single set of quantizers is used for each 
sequence and not varied during the run: During a scene change, the first new frame of 
the scene 1s heavily compressed to avoid spikes in the outgoing bit rate. The video 
quantizers employed produced an average bit rate of 80 kbps for the video sequence and 


40 kbps for the static sequence although the bit rate would be expected to vary locally. 
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Original Frame Layers 1, 2, and3 





Figure [V.18: Original and Reconstructed Frames From a Motion Video Sequence. 


Figure [V.20 and Figure [V.21 show the bit rate trace for the combined sequences 
and the plot of pSNR as a measure of reconstructed video quality (see Eq. (III.7)). The 
granularity in bit rate offered by a layered video hierarchy is evident in Figure IV.20; as 
congestion occurs, the lower layers could be retained while preserving most of the 
quality. The bit rate ratio among layers is approximately 5:3:2 for both sequences. As 
expected, the bit rate for the static sequence is much lower since the bit rate results solely 
from macroblock aging. For this reason, rate control is not of significant benefit for the 
static sequences. Using a pointer within the overhead slide would result in macroblocks 
selected due to motion and increase the bit rate slightly, but bit rate would still not reach 


the level displayed for the motion video sequence. 
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Original Frame Layers 1, 2, and3 
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Figure [V.19: Original and Reconstructed Frames From a Static Video Sequence. 


Figure [V.21 illustrates the progressive improvement in quality as additional 
layers are added to the base layer. At the beginning of each sequence, quality in terms of 
pSNR improves sharply over the aging interval following a scene change. After this 
period, quality is observed to remain relatively flat for each sequence regardless of the 
number of layers as expected since no attempt is made to vary bit rate. For the motion 
video sequence, the base layer provides a smoothed but acceptable display. Text in the 
frame is not readable, but the speaker’s movements are easy to follow. Adding the first 
enhancement layer improves sharpness and adds a 4 dB improvement in pSNR although 
small text is still difficult to discern. The second enhancement layer only adds 1-2 dB 
improvement but small text is finally readable and other features with fine edges are 


sharper. With static video, the role of the enhancement layers is even more dramatic. 


Ii] 


Even though most of the macroblock’s energy is included in the base layer and 
contributions from each frequency band are included, the base layer still shows a large 
degree of smoothness although the shapes are readily identifiable. Adding the first 
enhancement layer adds a 7 dB improvement and dramatically improves sharpness. The 
final layer, even though the bit rate contribution is the smallest of the three layers, almost 


doubles the pSNR, and the reconstructed frame 1s virtually identical to the original frame. 
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Figure IV.20: Bitrate per Frame for the Layered Video Sequence. 
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Figure IV.21: Reconstructed pSNR for the Layered Video Sequence. 
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E. SIMPLE LAYERED-VIDEO RATE CONTROL 


Compressed video is variable bit rate by nature since compression gain varies 
based on scene activity and complexity. However, transmission channels inevitably 
require some constraints on bit rate because of channel capacity or QoS constraints. 

Most commonly, bit rate is constrained to maintain a constant rate or to maintain a 
constant local-average bit rate over time. Many factors affect bit rate, but the most 
important is the tradeoff between quantizer step size and image fidelity. A larger step 
size results in a lower bit rate and a larger amount of distortion. Reducing the step size 
increases the bit rate but reduces the amount of distortion. Rate control, therefore, 
requires evaluation of the rate-distortion relationship created by a particular coder design. 
The rate control problem may be posed in terms of the rate-distortion relationship. The 
goal of the encoder is to minimize distortion D subject to a bit constraint R,, i.e., R< R, 
[53]. This problem is solved using Lagrangian optimization by expressing a cost function 
in terms of a distortion term weighted against a rate term [48]. The optimal solution is 
one that minimizes the cost function J, given by 

J=D+AR, (IV .9) 
where A is the Lagrange multiplier. Expressing distortion as a function of rate, D(R), and 


differentiating on both sides with respect to RX to find a minimum results in 
oF _ DIR). 4-9. (TV .10) 
OR’ OR 
which indicates that each Lagrange multiplier X yields a particular optimal solution. 
Each tangential point on the rate-distortion curve therefore corresponds to an optimal 
solution for a particular rate constraint. Figure [V.22 shows a possible rate-distortion 
curve and an optimal solution for a bit rate of Ro. While the true rate-distortion curve is 
guaranteed to be convex [48], the operational curve is influenced by the coder design, 
including the motion-prediction scheme employed, the quantizer design, and lossless 
coding gains. Therefore, rate contro] schemes tend to only approximate the rate- 


distortion relationship when determining a method for varying quantizer step size to 


achieve the desired bit rate. 


fi 





Ro R 


Figure [V.22: Rate-Distortion Curve and a Possible Optimal Solution. 


With any rate-control scheme, two issues are of importance. First, changes to 
quantizer step size must be communicated to the decoder, which adds to the coder’s 
oveniead depending on how often the parameter is changed. Second, rate-control 
schemes must be kept reasonably simple for real-time applications to minimize coding 
delay. 

Numerous feedback control schemes for rate control have been proposed that 
track actual bit allocation in some manner and use feedback to vary quantizer step size. 
The H.261 standard [67] suggests an approach described as liquid level control [6]. The 
H.261 reference coder examines the output buffer every 11 macroblocks. If the buffer is 
full, quantizer step size is increased. If the buffer 1s nearing empty, quantizer step size is 
decreased. H.261 leaves the actual rate control scheme up to the designer. One feasible 
approach is the feedback control scheme proposed by Choi and Park [76] that controls the 
Lagrange multiplier A based on the output buffer state. Low-delay rate control 
approaches have been described by Telnor Research [55] and Ribas-Corbera and Lei [77] 
for H.263 and H.263+, respectively. The Telnor approach linearizes the relationship 
between quantizer size and bit rate. At the start of each frame, the coder determines the 
deviation between the bits allocated to the last frame B;.,and the target bit allocation B, 
Re: 


AB, =B.,-B. (IV.11) 
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The coder also attempts to allocate an equal number of bits to each macroblock while 


encoding the current frame and tracks this deviation using the relationship 


AB, = B, ———B . (IV.12) 


MB 





where mp represents the sequence number of the current macroblock and Nyz the total 
number of macroblocks. Then, at the beginning of each new macroblock, the coder 


updates quantizer step size based on these deviations: 


On 3,14 e+ | (IV.13) 


where R is the allocated channel bit rate and Q,_, 1s the average quantizer size in the 


i-l 
previous frame. Telenor’s approach gives an equal weighting to each macroblock. The 
approach taken by [77] is similar but computes an optimal quantizer step size for each 
macroblock within the bit budget using the variance exhibited by each macroblock as 
well as heuristic weight indicating the perceptibility of decode artifacts. 

The issue of rate-control for layered video has not been well addressed in the 
literature. The rate-control problem is somewhat complicated by the multi-dimensional 
aspect of the rate-distortion curve expressing overall distortion as a function of an n- 
dimensional set of quantizers. In the coder presented here, the bit rate depends on a set of 
three quantizers. Two approaches are presented below. The first is based on a traditional 
rate-distortion approach that assumes that both rate and distortion for each layer are 
additive. The second approach uses vector quantization to reduce the dimensionality of 


the control problem and approximates an optimal rate-distortion curve. 
1. A Rate-Distortion Approach 


For a layered coder, separate quantizers are employed for each layer. Assuming 


that distortion for each layer i is additive, the rate control problem becomes minimizing 
N-} 
D> a (IV.14) 
=O 


subject to the constraint 


Its 


et 
RRS Ke (IV.15) 
i=0 


The assumption that each layer behaves independently allows the cost function to be 


rewritten as [48] 


J(R,)= D(R,)+ aR,;, (IV.16) 
r=S'J,. (IV.17) 


Since the costs are additive, J is minimized when each J; is minimized. Taking the 
derivative of Eq. (IV.16) to find the minimum results in 
oJ, _9D,(R,) 
aa ee IV.18 
OR, OR, ( 
Therefore, a particular bit rate R is optimal when each R; corresponds to points with the 
same slope on their respective rate-distortion curves. 


The distortion D; introduced by quantization is related to the rate R; by [78] 


_ 25-2R, 
Dj (R;)= Co; 27°" (IV.19) 
where C; depends on the pdf of the quantized variable, and o? is the variance of the input 


values. Using this relationship, the Lagrangian method yields the following optimal 


solution [48], 


—_ Oo. 
R.=R+log,—, (IV.20) 
p 


where R = R/N is the mean bit rate per layer, N is the number of layers, and 


N-} Yr 
oe TI (IV.21) 
i=0 ) 


The allocation given by Eq. (IV.20) ensures that each quantizer has the same average 
distortion. 

Using Eq. (IV.20), one possible frame-based rate control scheme could be 
implemented as follows. First, establish the bit allocation R for the current frame. Then, 


calculate the bit allocation for each layer using Eq. ([V.20). Finally, allocate the bits 
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evenly per coefficient for each layer. The bits allocated per pixel are used to calculate the 
quantizer step size. The following relationship is suggested to calculate quantizer step 
size for low bit rate video traffic [79]: 


2 
e oO, 


—), (fv222) 
lore 2 i: 





Uh 


where e is Napier’s constant and 0? is the variance of subband i. For macroblocks that 


use multiple quantizer step sizes, such as in JPEG coding, the result is used to establish 
the average step size for the macroblock. 

There are several drawbacks with this approach for rate control. The most 
important is that ensuring average distortion at each quantizer does not account for the 
perceptibility of errors in different frequency bands, and allocating errors in a different 
manner could provide more optimal results perceptually. The allocation also depends on 
knowledge of the variances exhibited by each layer. Although representative variances 
may be calculated a priori using test sequences, more accurate allocation requires 
dynamically estimating the variances, a computationally expensive procedure. Another 
problem is that Eq. ([V.20) may lead to negative bit allocations if the difference in 
variances between layers 1s large. This problem is correctable by forcing non-negative 
allocations in Eq. (IV.20) although the resulting allocations would not be optimal. 
Finally, Eq. ([V.20) does not take coding gain into account. Therefore, using R as the 
target bit allocation leads to bit allocations that are too low after taking VLC coding into 
account. One ad hoc fix is to replace R in the expression with 

he erG (IV .23) 
where G is the estimated coding gain expected from the entropy encoder. 

More sophisticated algorithms using the rate-distortion concept are available. For 
example, “greedy” schemes allocate bits one at a time to the quantizer demonstrating the 
most distortion [78]. Other schemes apply Lagrange multipliers to arbitrary rate- 
distortion curves [80]. However, computational complexity and delay limit the feasibility 


of more advanced methods when dealing with real-time video. 


Lele? 


2 Approximation of the 3-D Rate-Distortion Curve 


The approach above assumes that distortion is additive in the operational coder 
and gives the same average distortion for each quantizer regardless of the relative 
perceptual importance of errors in each layer. The assumption of additive distortion 
implies that a decrease in rate requires a suitable decrease in all quantizer parameters to 
yield an optimal solution. Rate-distortion curves in the operational coder are not 
necessarily convex, so the above approach does not necessarily yield optimal results. An 
alternate, albeit heuristic, approach is to simplify the contro] problem by creating a 
simplified, operational rate-distortion curve. 

An operational distortion curve is created by first plotting total bit rate and 
distortion (as measured by pSNR) separately in a three-dimensional space spanned by the 
set of candidate quantizers for a series of motion video sequences. This process captures 
the operational effect of the coder design, such as the quantizers and VLC coding as well 
as any interdependence between layers, on the rate-distortion relationship. The result is 
best described as a 4-D surface wherein both rate and distortion are functions of a triplet 
of quantizer parameters {4g1,q2,q3}. The first parameter represents the JPEG scaling 
factor while the remaining parameters represent the actual quantizer step sizes. 

Next, the points representing the pSNR surface are sorted in ascending order and 
associated with their corresponding quantizer triplets. For those triplets producing 
approximately the same pSNR, only that point with the smallest bit rate is retained. The 
result is an implicit vector quantization of the operational 3-D rate-distortion surface. 
The dimensionality of the operational rate-distortion curve is therefore reduced to the 1-D 
curve covering the operational range of the coder as shown in Figure IV.23. Each point 
on the curve represents results from a single quantizer triplet. The corresponding 
quantizer triplets are plotted in Figure [V.24. The results indicate that an optimal rate 
contro] scheme does not necessarily increase/decrease each quantizer parameter in 


lockstep as would be expected if distortion in each layer were independent. 
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Figure [V.23: Operational Rate-Distortion Curve for Motion Video. 
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Figure IV.24: Quantizer Table Triplet Values for Motion Video. 


Reducing the rate-distortion relationship to a suboptimal 1-D relationship 
provides a potential method for a simplified layered rate control scheme since the set of 
possible quantizer parameters is reduced to a more manageable set of suboptimal 


parameters. Considering each triplet as a suboptimal quantizer state, a feedback control 
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scheme manipulates the quantizers for each layer by selecting only entries from this set 
via table lookup. One possible method is considered next. 

Using the operational rate-distortion curve, a control curve relating bits per frame 
to each suboptimal quantizer vector is created as shown in Figure [V.25. After 
linearizing the control curve over the operational range of the coder, the slope represents 
the average increment or decrement in bits per frame with a step change in the quantizer 
table. Dividing this quantity by the average number of macroblocks selected per frame in 
the test sequences yields the desired control parameter 7, 

AB | 


f=— 


= IV.24 
AQ Nuss a 





= AB . 
where N me represents the average number of macroblocks selected per frame and 40 1S 


the slope of the control curve. In Figure IV.25, 8 was determined to be -11.0 
bits/macroblock-step. The control parameter is then used to adjust the coder quantizer 


vector with each new frame as per the following scheme. 
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Figure IV.25: Operational Rate Control Curve. 
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At call setup, the average bit allocation per frame is set to 
R 


Be eee (IV.25) 
ife 


where R is the channel bit rate and fis the frame rate. For each new frame /, we use 


target 
the actual bit allocation from the last frame i - | to estimate the bit allocation error or 

deviation expected for the current frame 7 if the quantizer vector used in the last frame is 
not changed. Accounting for the change in the number of macroblocks selected between 


the last and current frames, the deviation expected is: 


_ {N 
RB bee (IV.26) 


inter t-] 
N MB._, 


The required change in the quantizer setting is calculated using the deviation AB inter, the 


number of macroblocks selected for transmission in the current frame N,,, , and the 
control parameter: 


AB ter (IV.27) 


he | 
Nu, P 


where [ | is the fixed integer operator, which discards the decimal portion of the result. 


The result indicates that the quantizer setting from the last frame should be incremented 


or decremented by AQj. If the quantizer has reached the upper or lower limit of the 


table, the value is not changed. 

Video traces for a rate controlled video sequence and a video sequence using only 
open-loop control are shown in Figure IV.26. Open loop control consists of selecting the 
quantizer setting that results in the bit rate closest to the one desired and then not 
changing the setting for the duration of the sequence. In each case, the target bit rate was 
80 kbps. The results indicate that the frame-based rate controller maintains the local 
average Closely and also smoothes the bit rate somewhat as measured by each sequence’s 
variance. As presented in the next chapter, smoothing the bit rate increases multiplexer 


efficiency and reduces bandwidth requirements. The drawback of rate control is a slight 
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variation in frame-to-frame quality relative to open-loop control as shown in Figure 
IV.27. The statistics for each sequence are listed in Table IV.9. A variation of this 
approach was examined to increase the window used to predict the current deviation from 
just one frame as indicated in Eq. IV-28 to m frames to reduce bit variations. Offline 
coders look back m frames to calculate the deviation [81], but increasing the search 


window as In 


ey, 
AB ste = mB — » — =, (IV .28) 
j=] 


i-j 
N vp, : 


actually resulted in looser tracking in the sequences examined. 
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Figure IV.26: Bit Rate Traces for a) Controlled and b) Uncontrolled Video 


Sequences. 
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Changing the quantizer only at the beginning of each frame may provide 
insufficient granularity to adequately suppress deviations from the desired bit rate. In this 
case, a more desirable approach is to examine the quantizer vector each macroblock and 
make changes as required to control the target bit distribution among the macroblocks. 
However, this approach is more complex than frame-based control and may cause quality 
variations throughout the frame during high activity periods. One simple scheme is to 
distribute the average bit allocation for each frame evenly among all the selected 


macroblocks in a similar manner to the Telenor rate control scheme [55]. Given that 


Nye macroblocks are selected in the current frame and an average bit allocation of B 


bits is used, each macroblock receives B/ IN panos: 


Controlling bit rate at the macroblock level is performed as follows. At call setup, 
average bit allocation per frame is set to 
R 


B= a (IV.29) 


where R is the channel] bit rate and f is the frame rate. For the first macroblock of the 


target 
new frame i, we calculate the expected deviation in the bits allocated to the current frame 
if the quantizer setting from the last frame is not changed as above and apportion this 
deviation over the number of macroblocks selected. This value is used to determine the 


change required in the quantizer setting for the first macroblock: 





ae N 
AB inter = B ns — i-]? 
MB;_, 
(IV .30) 
AB. 
AQ,, = inter ' 
N up, P 
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Figure [V.27: pSNR Variation for a) Controlled and b) Uncontrolled Video 


Sequences. 
Parameter With Rate Control Without Rate Control 
Mean Bit Rate (bpf) 7998 7454 
Bit Rate STD (bpf) 942 1362 
Mean pSNR (dB) 29.83 29.51 


pSNR STD (dB) oz 1.74 


Table IV.9: Rate Controlled and Uncontrolled Sequence Statistics. 
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For each remaining macroblock j, j = 2 to Nyg, we calculate the deviation between the 
bits allocated so far within the frame and the target linear distribution. Assuming that the 
number of bits allocated so far within the current frame is B;;.;, and given that the target 
bit allocation per macroblock indicated by Eq. ([V.30), the deviation at macroblock j is: 


pele 


inta N i,j-1° 
MB, 


(IV.31) 


This deviation is then used to set the quantizer parameter for the current macroblock: 


a (IV.32) 
B ; : 


One possible objection to rate control at the macroblock level using the scheme 


12, +| 


above is that the linear bit allocation across the selected macroblocks takes into account 
neither the level of activity within each macroblock nor the perceptual importance of 
individual macroblocks. Therefore, the linear approach can be generalized by 
introducing a weighting factor W; for each macroblock that represents the relative 
proportion of bit allocation to be assigned to that macroblock: 

B,, =W,B,. (IV .33) 
The only constraint placed on W; is that all weights sum to | to achieve 


By — ee Be (IV.34) 
j J 


The linear assignment scheme, with W;= 1/ Ny, obviously meets this condition. Two 


approaches provide a means to tailor bit activity to macroblock activity level. First, 
macroblock selection rate provides a heuristic indication of motion within the current 
scene. Given the set of macroblocks selected for the current frame, each macroblock’s 
past selection history can be used to determine a selection probability p, relative to the 
current set. Such a selection probability provides a convenient measure of motion. 

Those blocks that are selected more often tend to lie in regions of greater motion. 
Therefore, an appropriate weighting factor that emphasizes regions of greater motion is to 


set 
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W, = p,. (IV.35) 


However, the coder must refresh selection counts after every scene change to avoid 
biasing the motion detection. Another approach is to weight the bit allocation by the 
variance exhibited by each macroblock, thereby allocating more bits to macroblocks with 
higher variance. A similar approach is followed in [77]. Using the rate distortion 


allocation scheme outlined above, a weighting factor based on variance is 


0; 
B. eG) ee 


I I 0; 
W, ee Pr oe ae (IV.36) 
. B. B. p 


[ i ! 


where B; is the current frame bit allocation, B;; is the allocation for the jth selected 


macroblock, and o j is the variance of the coefficients in the jth macroblock. The only 


drawbacks to this scheme are that weights may be negative and macroblock variance 
must be tracked, which increases computational overhead. 

Continuing this approach with static video produces interesting results. As shown 
in Figure [V.28, the operational rate-distortion curve is relatively flat over a wide range 
of bit rates. Since the coder’s operational range falls into this region, rate control as 
described above is not possible since all of the quantizer states produce the same level of 
quality. However, rate control is not a distinct requirement for static sequences. Since 
macroblocks are only transmitted due to aging, bit rates for static sequences are 
considerably less than those observed in motion video sequences. Accordingly, open 
loop rate control is adequate for static sequences. The quantizers are preset for static 
sequences to the quantizer triplet that yields the lowest bit rate in the flat distortion region 


and fixed for the duration of the sequence. 
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Figure IV.28: Operational Rate-Distortion Curve for Static Sequences. 


The clear implication of rate control is that any change in the quantizer setting 
must be communicated to the decoder. Although the operational control curve shown in 
Figure IV.25 reduces the amount of data used to describe each quantizer state, 
transmitting the quantizer setting consumes bandwidth, and update frequency should be 
minimized. Therefore, at a minimum, the current quantizer vector must be transmitted 
with the frame header using frame-based rate contro] and with each macroblock using 
macroblock-based rate control. In either case, using a VLC code to communicate only 
the change in quantizer setting, as in differential pulse coding, can further reduce 
overhead. However, the minimal approach directly conflicts with the need for robust 
coding. If the frame header is damaged, the quantizer settings for that frame are lost. 
Differential coding creates a liability unless some facility is made for refreshing the 
quantizer state after any interruption due to lost cells. To ensure that each GOB is 
independently decodable for robustness, the following compromises are possible. For 
frame-based rate control, the quantizer setting, in the form of the lookup table index, is 
included in every GOB header. For macroblock-based rate control, the quantizer setting 
is coded differentially between macroblocks within the GOB and refreshed every GOB. 
Differential coding within the macroblock poses no liability since a dropped cell 


interrupts decoding until the decoder resynchronizes with the next GOB header. 
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This chapter introduced a new layered coder design motivated by the need to 
provide a flexible video delivery scheme for greater robustness over heterogeneous 
networks. Attention was focused on those elements required to promote the effectiveness 
of layered coding. In general, the coder uses the fast Haar transform to decompose 
selected macroblocks into subbands, and then subbands are allocated to layers based on 
their relative perceptual importance. Specifically, a generalized layering scheme was 
devised for motion video that allows creation of an arbitrary layering scheme as a 
function of video content as evidenced by subband variance. However, a common 
layering scheme for motion video and static presentation slides is impractical since each 
attaches a different perceptual relevance to the various subbands. Therefore, different 
layering schemes are employed for each type of video content; the coder picks the 
appropriate scheme dynamically within the video sequence. 

A final issue examined was that of rate control for the layered video sequence. 
Since subbands are essentially layered by common variance, each layer employs a 
different quantization scheme. Rate control via traditional rate-distortion techniques is 
complicated by the increased dimensionality of the layered coder’s rate-distortion surface 
and the possible inter-dependence among quantizers. Rate control is simplified by 
selecting a suboptimal set of quantizer vectors, where each vector consists of step size for 
each quantizer, thereby effectively reducing the operational rate-distortion curve to a 1-D 
relationship. Rate control, either at the frame level or macroblock level, is implemented 


via a simple table lookup. 


V. TRAFFIC SMOOTHING 


The previous chapter presented a new scheme for preparing a video sequence for 
transmission over the network by coding the sequence as a hierarchical series of layers. 
The next chapter exploits the relative perceptual importance of each layer through 
priority-based scheduling. However, the manner in which the layers are transmitted to 
the network, i.e. the statistical characteristics of each cell flow, plays a role in 
determining the resources each switch must commit to the sender to guarantee that 
sender’s required QoS. In general, the more random the cell flow, the more resources, 
such as bandwidth, must be committed. Consequently, by manipulating the statistical 
characteristics of each traffic flow prior to the network, the network’s capacity for 
carrying traffic is enhanced, which is particularly desirable for low-bit-rate networks. 

This chapter examines the concept of traffic smoothing for layered video traffic as 
a means for increasing transmission robustness by increasing queuing efficiency. The 
chapter starts by discussing the concept and application of traffic smoothing. Next, the 
psuedo-histogram traffic model proposed by Skelly et al. for VBR video is presented 
[14]. The psuedo-histogram has the advantage of capturing the effect of frame-by-frame 
smoothing on queue behavior. Details on determining model parameters and analytical 
techniques for D/D/1/K queues are presented including a simple technique for rate- 
controlled video. Finally, an integrated scheme is proposed for traffic smoothing of 
layered video traffic at various time scales: frame level, layer level, and cell level. The 
issue of where to apply traffic smoothing for the single VCC and multiple VCC cases is 


examined along with the issue of mitigating delay added by frame-by-frame smoothing. 
A. INTRODUCTION 


One of the functions of ATM traffic management is call acceptance, which 
ensures that sufficient network resources exist prior to accepting a new connection with 
specified QoS requirements. The requisite resource allocation as a function of the 


required QoS'depends on statistical properties of the connection’s traffic flow. The 


ip 


requisite allocation may also depend on the properties of other connections currently 
within the network. Each new connection characterizes its anticipated traffic properties 
via a set of descriptors that depend on the type of service required [28]. Possible traffic 
descriptors include peak cell rate (PCR), a sustainable cell rate (SCR), and the maximum 
burst size (MBS). The network layer then uses these traffic descriptors and the current 
network state to determine whether to admit the call. If the call is admitted, a traffic 
contract is formed between the connection and the network. The connection agrees to 
abide by the traffic descriptors and the network agrees to allocate resources such that the 
connection’s QoS is maintained. 

Assuming that the VCC traverses sequential queues, QoS is guaranteed by 
ensuring that sufficient channel allocation exists at each queue such that the QoS 
parameters are maintained. Focusing on an individual queue, the required channel 
allocation depends on the arrival process, the QoS required, and the service process. For 
ATM networks, service is deterministic. However, the service rate depends on the 
required QoS and the arrival process. For a given QoS and a given arrival process, the 
goal is to minimize the service rate required. 

Since QoS is usually fixed for each particular traffic type, the arrival process 
weighs heavily in the channel allocation. The traffic flow within each connection may be 
viewed as a random process. In general, the channel allocation to that traffic flow 
depends on the relative uncertainty or random variation in its arrival process at a 
particular queue. In particular, the greater the uncertainty in a traffic source’s arrival 
process, the greater the bandwidth required to meet the desired QoS. For example, CBR 
traffic is completely characterized by its peak cell rate alone. By definition, the 
instantaneous arrival rate for VBR traffic 1s time varying although the average rate is 
fixed'*. A simple method for characterizing the variation in the arrival rate is the ratio of 
PCR to average cell rate [82]. This ratio represents the burstiness of the source; a higher 


ratio denotes a burstier source. For a CBR source, this ratio is one. Alternately, the 


'? Otherwise an ATM network would not be able to ensure QoS for the duration of the connection. 
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burstiness of a VBR source can also be expressed in terms of the variance of cell 
interarrival times [82]. 

The problem of bandwidth allocation for a bursty source may be viewed from the 
perspective of a deterministic ATM queue. A connection is guaranteed to lose no cells if 
the service rate exceeds the arrival rate. With a bursty source, selecting the service rate 
equal to source’s PCR ensures that no cells are lost. However, the channel is 
underutilized with this allocation. Selecting the service rate equal to the average cell rate 
fully utilizes the channel but leads to a large amount of cell loss. Given an acceptable 
CLR, the appropriate service rate lies between the PCR and the average cell rate, which 
implies that a certain amount of underutilization must be tolerated to achieve the desired 
QoS. Of course, this exact characteristic provides the basis for statistical multiplexing 
since the aggregate multiplexed source 1s considerably less bursty than each individual 
source. 

Given that uncertainty in the arrival process increases bandwidth requirements, 
altering a connection’s traffic characteristics through traffic shaping is desirable to 
increase the number of connections that may be serviced with a given amount of 
bandwidth. Alternately, traffic smoothing increases robustness during periods of 
congestion since leveling out bursts tends to reduce the probability of buffer overflows. 
Both considerations are especially important given the low bandwidth VTC scenario 
presented here. Traffic shaping may be further differentiated into the functions of traffic 
smoothing and traffic policing. Traffic smoothing attempts to reduce or control 
burstiness either at the application level or at some point prior to entry into the network. 
Traffic policing monitors a connection’s traffic parameters and takes action to correct 
deviations. For example, Usage Parameter Control (UPC) in ATM monitors each 
connection to ensure that its traffic conforms to the traffic contract [18][28]. Non- 
compliant cells are tagged and may be dropped later in the network to avoid impacting 
the QoS guaranteed to other connections. The two functions are not totally unrelated; 


controlling burstiness, perhaps at the application level, may be viewed as a form of self- 
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imposed traffic policing. Here, attention 1s focused only on the application of traffic 
smoothing on video traffic. 

The first logical place to implement traffic smoothing is at the application level 
through rate control. Rate control as presented in the last chapter represents a type of 
self-imposed traffic policing; the rate controller attempts to maintain some traffic statistic 
at a fixed level through control over the quantizer setting. However, rate control provides 
an obvious mechanism for traffic smoothing. Forcing transmitted video to a constant bit 
rate completely removes the burstiness inherent in video traffic, but at the cost of 
potentially wide variations in quality from frame to frame. A less severe tradeoff is to 
settle for a constant mean bit rate which is the approach taken in Figure IV.26. In this 
case, quality variations between successive frames are less noticeable, and the level of 
burstiness is decreased as indicated by the drop in bit rate variance (see Table IV.9). 
Before rate control, the burstiness factor is 1.41; after imposition of rate control, the 
burstiness factor drops to 1.21. Of course, controlling only the mean bit rate does not 
guarantee any particular degree of smoothness. With proper design, a rate control 
scheme should be able to achieve an arbitrary level of smoothness that is bound only by 
the permissible coding delay. 

A more general method for smoothing a traffic flow prior to entry into the 
network is the leaky bucket scheme proposed for network access control [83][84]. 
Access control ensures that a traffic source does not exceed its traffic parameters agreed 
to as part of the traffic contract. The scheme 1s illustrated in Figure V.1. The basic idea 
is that the leaky bucket mechanism controls access to the network. ATM cells arriving at 
the leaky bucket must obtain a token from a token pool to enter the network. Tokens are 
generated at a constant rate r and placed in the token pool. Additionally, there is a 
maximum limit on the number of tokens in the token pool at any time, and tokens 
arriving after the token pool 1s full are discarded. The token pool is sized to control the 
maximum burst length from the source, 1.e., the maximum number of cells that can be 
transmitted back-to-back. Restricting the number of tokens controls the burstiness of the 


source while the token rate dictates the average cell rate. If a cell arrives and a token is 


not available, three courses of action are available. The cell could be discarded; the cell 
could be buffered until a token becomes available; or the cell could be tagged as non- 
compliant and transmitted. The cumulative affect of buffering and manipulating the 
token rate allows considerable flexibility in altering traffic statistics. However, buffering 
introduces delays in the forward transmission path, and the gain offered by smoothing 


must be weighed against the added delay. 
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Figure V.1: Leaky Bucket Access Mechanism. 


While originally conceived as an access contro] mechanism, the leaky bucket 
scheme controls by smoothing the traffic flow. However, smoothing is performed for the 
purpose of ensuring compliance with the traffic contract. The approach may be 
generalized for smoothing at other points prior to network entry, such as at the 
application level prior to the AAL or within the AAL prior to the ATM layer. In either 
case, tokens are used to permit transfer of PDUs instead of ATM cells. This offers 
another avenue for smoothing video traffic prior to network entry. For example, a CBR 
type smoothing can be implemented by setting the token rate r proportional to the 
channel rate and setting the token pool size to one. Then, arriving PDUs are buffered and 
transmitted to the next lower layer at the token rate, maximizing smoothness but 
potentially increasing the transmission delay. 

Given the impact of traffic statistics on queuing efficiency, characterizing VBR 
video traffic sources via stochastic models plays an important role in network 


performance analysis. In particular, traffic models provide a powerful tool for analyzing 
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the impact of the arrival process on queue behavior through either simulations or 
analytical analysis. For example, traffic models can provide insight into determining 
appropriate tradeoffs between buffer depth and service rate to achieve a desired QoS. For 
a traffic model to be useful, the model should perform two functions. First, the model 
must accurately represent traffic statistics, namely the first and second moments and the 
covariance function. Second, to evaluate QoS metrics such as cell delay and cell loss and 
to validate simulation results, the traffic model should extend to some form of analytical 


queuing analysis. Meeting both of these goals is a non-trivial task. 
B. VIDEO TRAFFIC MODELING 


This section presents three VBR video traffic models as background for traffic 
simulations conducted in later sections and to motivate, in part, the smoothing 
mechanism presented in the next section. The autoregressive models proposed by 
Maglaris et al. [86] and Sen et al. [88] are interrelated and have been used to model VTC 
video traffic [27]. The histogram-based video traffic model proposed by Skelly et al. [14] 
is notable in that it captures the effect of smoothing video traffic on a frame by frame 
basis and provides particularly versatile queuing analysis techniques. 

Modeling VBR traffic requires capturing the interdependence between coder 
design and video activity level that influence the video stream’s arrival process. 
Important factors with regard to the coder are the compression scheme employed, 
particularly in the distribution of I- and P-frames, and the presence of rate control. Video 
activity influences the compression gain through the level of scene activity or motion and 
the periodicity of scene changes. Video traffic models attempt to accurately capture the 
first and second moment statistics of the traffic source along with its covariance function. 
A useful traffic model also incorporates queuing analysis techniques that allow 
calculation of QoS metrics, such as cell delay and cell loss rate, to validate simulation 


results. Another desirable trait is low computational complexity. 


134 


A Autoregressive Models 


- A representative video trace, in bits/pixel, is shown in Figure V.2 for a rate- 
controlled “talking head” scene typically found in VTC. Such sequences usually are 
characterized by a roughly Gaussian shaped bit rate histogram and an exponentially 
decaying autocorrelation function. On the strength of these observations, VBR traffic 
models based on a first order autoregressive processes have been proposed by Maglaris et 
al. [86] and Heyman et al. [87]. Using a first order autoregressive model, the variation in 
bit rate is expressed as 

A(n)=ad(n-1)+bw(n) (V.1) 
where w(n) is Gaussian white noise with unit variance but a non-zero mean. The 
parameters in Eq. (V.1) are determined using the first and second-order statistics 


measured from the video sequence along with the estimated autocorrelation function. 
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Figure V.2: Video Trace for a Low Activity Sequence. 


Although a first order autoregressive process captures the effect of bit rate 
variation, these models provide little insight into queuing behavior. Sen et al. [88] has 
proposed a model for NV multiplexed video sources that can be applied in queuing 
analysis. The model represents the aggregate video sequence as the output of M 
multiplexed identical, two-state Markov chains, or minisources, where M >> WN. Each 
minisource alternates between an off-state and an active state as shown in Figure V.3. 


When multiplexed, the minisources yield an equivalent (M + 1)-state Markov chain 


ee 


wherein each state transmits at a fixed multiple of R cells/second. Using 20 or more 
minisources per video source reduces the affect of quantization. The model’s parameters, 
a, B, and R, are determined from the first and second moments as well as the 
autocorrelation function for a single video source; all video sources are assumed to have 
the same statistical characteristics. Given the model parameters, cell loss probability and 
buffer occupancy Statistics are determined through fluid-flow analysis [27]. A 
shortcoming of the minisource model is the inability to model an arbitrary bit rate 
histogram since bit rate follows a binomial distribution [14]. 

While both of the above models do a good job of characterizing bit rate variations 
within a scene, no attempt is made to capture the effect of scene changes. Given the 
behavior of motion-compensated video coders, aperiodic bit rate peaks are expected due 
to scene changes since, following a scene change, most macroblocks are intracoded due 


to a lack of a suitable reference in the last frame!?. 
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Figure V.3: Minisource Video Model. 


2. Histogram-based Traffic Modeling 


The histogram-based video traffic model proposed by Skelly et al. [14] represents 
an intermediate approach between autoregressive modeling and self-similar traffic 
models. The premise of the model is very simple: quantize the arrival rates and then 
approximate the video sequence by its quantized version. Motivation for the model stems 


from the need to smooth the video traffic flow. Dixit and Skelly, in an earlier work [89], 


'S For an analogous reason, periodic bit rate peaks occur in MPEG-encoded sequences due to the GOP 
structure. 
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demonstrated the relation between traffic smoothing and ATM multiplexer performance. 
Given a buffered, compressed video frame, the resulting ATM cells could be transmitted 
in several manners. For example, the cells could be transmitted at the peak available 
channel rate until the buffer is emptied. The resulting traffic is very bursty since the 
video coder transmits at a high rate for a brief period and then falls idle for the rest of the 
frame. The problem with this approach is that when several sources are multiplexed, any 
correlation between the burst periods tends to increase cell loss dramatically. 

Dixit and Skelly [89] instead proposed to transmit the buffered cells randomly 
over the entire frame interval as a Poisson stream to the ATM multiplexer. Skelly et al. 
[14] combined this smoothing scheme with the quantized video traffic model described 
above. Each quantized level represents a single frame, and cells from each quantized 
level are transmitted as a Poisson stream to the multiplexer.over one frame interval. 
Assuming that transitions between levels may occur every frame and that the transitions 
are memoryless, the resulting traffic model is a discrete-time multi-state Markov- 
modulated Poisson process (MMPP) as shown in Figure V.4 (some transitions are 
removed for clarity). The Markov chain serves to modulate the underlying Poisson- 
smoothed arrival process, where each state 1 corresponds to a Poisson process whose 
arrival rate 4; matches the size of the compressed frame in bits for that state. Shroff [15] 
later expanded the MMPP model into the generalized histogram model, also known as a 
Markov-modulated rate process (MMRP), which incorporates arrival processes other than 
Poisson [15]. In particular, Shroff demonstrated that the maximum queuing efficiency in 
ATM multiplexers is achieved by smoothing deterministically, i.e., by transmitting cells 
at equal intervals throughout the frame interval. The result resembles a modulated CBR 


process with a new rate every frame. 
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Figure V.4: Markov-modulated Poisson Process (MMPP). 


3. Determining Model Parameters 


The histogram model parameters consist of the MMRP state probabilities, the 
state transition probabilities, and the state arrival rates and are estimated from the video 
sequence in the following manner. The video sequence is uniformly quantized into n 
bins, where each bin represents a single state. The quantized arrival rates A; represent the 
arrival rates for their respective states. Next, transition probabilities between states are 
measured directly from the quantized sequence yielding the state transition matrix P. The 
steady state distribution is given by 

= (V.2) 
where 7; is the steady-state probabilities for state i. The state probabilities can be 
determined by solving the eigenequation: 

ee (V.3) 
Alternately, 7 is the eigenvector of P whose corresponding eigenvalue is | [49]. Since 
the rate of the modulating process is much slower than the modulated process, an 
equivalent continuous-time Markov process is determined from [27] 

M=f(P-I), (V.4) 
where f is the frame rate, and M is the infinitesimal generating function representing 
transition rates from each state. 

Once the model parameters have been determined, one check of the model’s 
fitness is to compare the model’s first and second moments and autocorrelation function 


to those of the actual sequence. For the model, the mean is given by: 
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Ela(n)l= ye) | (V.5) 
i=l 
The autocorrelation function is given by [27]: 
Pine Ne » > 44,PAG +)=Alan)=4,Pa@)=4,). (v6) 
ae | 
Since the histogram model approximates the actual histogram of the given video 
sequence, the model is able to support a wide range of video activity and compression 
schemes. For example, while the MMRP model does not explicitly model scene changes, 
the peaks in bit rate resulting from scene changes are implicitly captured in the higher 
states. Skelly et al. [14] presented results from 10 second JPEG encoded sequences taken 
from “Star Wars”. Compared to the original sequences, the eight-bin model predicts a 
slightly higher mean bitrate and provides a good match for the autocorrelation function 
over a range of four seconds (96 frames). While increasing the resolution of the 
histogram did not dramatically change the approximation, employing less than eight bins 
resulted in a poor approximation. With rate-controlled video segments, satisfactory 
results have been reported using as few as six states [90]. 
Given the histogram parameters for a single source, an equivalent histogram for N 
homogenous sources may be obtained through N — 1 convolutions [91]: 
TA =H *T | (V.7) 
The state arrival rates are given by 
A‘ = NA, +(i-1)AA, AA =A, -A,, i=1,2,...,2N -1. (V.8) 
For heterogeneous sources, the process is slightly more difficult and the equivalent 
histogram must be resolved one source at a time. Given two non-equivalent histograms, 
the joint histogram may be written as a two-dimensional Markov chain with N’ states 
[27]. The probability for state (m,n) is given by 
Rinn =A Ny (V.9) 


and the aggregate arrival rate by 


Ann =Am tn» (V.10) 
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where the indices m,n refer to the corresponding States in the original Markov chains. 
The result can be converted back into a one-dimensional histogram by renumbering the 
states in order of increasing arrival rate. Compared to the homogeneous case, the size of 
the aggregate histogram grows much more rapidly although coalescing states or deleting 
highly improbable states, in comparison to the simulation length, can possibly reduce the 


S1Ze. 
~~ Queuing Analysis 


Cell loss analysis proceeds by invoking a quasi-static behavior for the MMRP 
model and assuming that the rate of the modulating process, the Markov chain, is far 
slower than the rate of the modulated process, the state arrival rate. With this 
assumption, the queue is expected to reach equilibrium rapidly compared to the time 
interval between frames, and each state may be treated as an independent source. The 


probability that the buffer contains 7 cells is given by [27]: 
PIN =n]= ¥ PIN =nl0 =4,]z,, (V.11) 
i=] 


where 7; are the state probabilities, and PIN = nlA = Ai is the probability that the buffer 


contains n cells given the arrival rate A;. From Eq. (V.11), the buffer distribution for each 
individual state depends on the arrival process to the buffer, which in turn depends on the 
smoothing mechanism and the type of service granted. Given that ATM uses fixed- 
length cells, service is usually deterministic. Although the original histogram model used 
Poisson smoothing, Shroff [15] has demonstrated that deterministic smoothing yields 
better queuing performance. Therefore, further discussion is limited to only D/D/1/K 
queuing systems. Equation (V.12) indicates that the transition rates between states, and 
by extension the shape of the autocorrelation function as given by Eq. (V.6), play no role 
in determining the buffer occupancy distribution as would be expected if self-similarity is 
a significant factor. Indeed, Skelly’s [14] results indicate that accurately capturing the 
autocorrelation function plays a greater role in modeling buffer distributions than the 


actual shape of the autocorrelation function. 


140 


Although buffer distribution is of interest, analyzing cell loss is more important in 
determining appropriate buffer depths. The system loss probability, assuming that states 


are independent, is given by [27] 
l n 
L aa Ee lent ( ) 


where E[A] is given by Eq. (V.5), 7; are the state probabilities, A; are the arrival rates, and 


P, are the loss probabilities for that state. Equation (V.12) represents the aggregate loss 


rate as the sum of cells lost from each state over a long interval, weighted over each of n 
states, divided by the expected number of arrivals. The individual loss rates in Eq. (V.12) 
depend on the queuing system being evaluated. For a D/D/I/K queuing system, assuming 
a very long sojourn time T for each state, allows a simple approximation for loss rate 
[15]. If the arrival rate is less than the service rate, no cells will be lost since an arriving 
cell finds the server idle or servicing acell. If the service rate is less than the arrival rate, 
cells not serviced during the sojourn time or buffered are lost. The loss probability in this 


case is given by: 


P. = lim 2" 
; ; (V.13) 
=|-—, PpP=—, 
p a 


where T is the sojourn time, A is the arrival rate, and w is the service rate. Considering 
both scenarios, the loss rate for the ith state for determistic arrivals and determinisitc 


service 1S: 


Pos (V.14) 
Denny eal: : 


Substituting the result from Eq. (V.14) for each state into Eq. (V.12) gives the system 
loss probability. 
For D/D/I/K systems, Eq. (V.14) indicates the counterintuitive result that cell loss 


probability is independent of queue size K. However, cell loss behavior demonstrates 
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two distinct patterns dependent on buffer size, the cell region and the burst region, as 
shown in Figure V.5 [27]. In the cell region, cell loss drops rapidly with buffer size, and 
cell losses are confined to individual cells. This region is modeled well by Eq. (V.12). In 
the burst region, cell loss drops at a slower but exponential rate with buffer size; cell 
losses occur in bursts in this region, a behavior not captured by the histogram model. 
Equation (V.14) indicates both regions coalesce into a constant value for D/D/I/K 
systems. However, simulations show that these systems lie instead in the burst region 
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Figure V.5: Cell and Burst Regions for Cell Loss. 


Shroff offers an ad hoc technique for estimating cell loss probability using MMRP 
models by incorporating fluid level analysis to capture behavior in the burst region [15]. 
In the cell region, loss is calculated using Eq. (V.12) with an appropriate expression for 
P;;. In the burst region, fluid level analysis is used to predict the exponential relationship 
with queue size in the form, 

P(x > K)= Ae™ (V.15) 
where 6 is dominant eigenvalue from the fluid level representation of the system. Using 
the infinitesimal generating function for the histogram model, 6 is the least negative 
eigenvalue of the array D"'M, where D is given by: 

D = diag|A. — u). (V.16) 
The constant A in Eq. (V.15) is determined by piecing the cell region and burst region 


curves together at the cutoff point Ko where both curves have equal slopes. Then the 


142 


constant A is a function of the cutoff buffer size and the cell region loss probability at that 
buffer size, 


A= P(x> Ky) x eo | (V.17) 


cell region 


Together, Eq. (V.12) and Eq. (V.15) provide a complete description of the cell loss 
behavior with queue size. A MMRP system with deterministic arrivals represents a 
special case since the queue is always in the burst region. The cell loss probability is 
determined by correcting Eq. (V.12) directly by the factor ao | 

For multiplexed sources, cell loss probability is determined by applying the above 
techniques to the equivalent histogram resulting from numerical convolution of the 
individual histograms. Shroff’s technique extends easily in the case of multiplexed 


homogeneous sources but becomes more difficult with heterogeneous sources [92]. 
Si Application 


In the next section and the next chapter, a MMRP model is used to represent a 
deterministically smoothed layered video traffic source. The model is used both as a 
traffic source in OPNET simulations and as an analytical model for queuing calculations. 
Model parameters were derived from the rate-controlled sequence shown in Chapter IV. 


The actual parameters are given in Appendix B. 
C. SMOOTHING LAYERED VIDEO TRAFFIC 


Traffic smoothing improves multiplexer performance; the implied benefits are a 
degree of bandwidth conservation, which permits the network to guarantee QoS for a 
given level of traffic with less bandwidth. This is particularly desirable for the low bit 
rate network envisioned in Chapter Il. While smoothing has been discussed previously, 
coverage has focused on network-level traffic shaping for both traffic policing and 
improving multiplexer performance. In this section, we propose a new smoothing 
scheme targeting layered video that is notable in two ways. First, we focus on 
developing a practical smoothing mechanism implemented at the sender prior to the 


ATM layer. The goal is to avoid manipulating traffic streams at the ATM layer since 
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maintaining a separation between network and client layer functionality is desirable to 
preserve network interoperability. Second, the smoothing mechanism covers three time 
scales: frame level, layer level, and cell level. The former is considered briefly while the 
latter two are the main focus of this section. 

Based on previous discussion, rate control provides frame-level smoothing by 
limiting variations from the target bit rate. This type of smoothing is obtained essentially 
as a byproduct since rate control is a necessary component to ensuring compliance with 
the traffic contract in ATM networks. As shown in Figure IV.26 and using the values 
given in Table IV.9, the rate control mechanism discussed in Chapter IV produces an 


approximate 16% decrease in burstiness. 
1. Cell Level Traffic Smoothing 


While rate control smoothes variations in bit rate over multiple frames, a more 
explicit approach is to smooth at the cell level by controlling interarrival times to the 
ATM multiplexer. As discussed in the last section, this exact concern partially motivated 
Skelly’s [14] histogram traffic model. Following Skelly’s approach of smoothing 
individual frames, we propose an analogous smoothing scheme implemented via a leaky 
bucket type mechanism. The basic approach is shown in Figure V.6. Smoothing 
proceeds by modulating the arrival rate into the network for each individual frame. Each 
compressed frame is buffered prior to transmission into the network, and portions of the 
compressed frame, termed transmission units for now, are released for transmission 
whenever a token is available. Tokens are generated at a fixed rate r and only a single 
token is available at a time. The combined effect is to deterministically smooth the flow 
of transmission units by releasing them for transmission at intervals of 1/r seconds. The 
token rate r is evaluated anew each frame and is set to the arrival rate for the current 
frame as measured in transmission units per second. In this manner, the token rate is 
assigned a value sufficient to ensure that the entire frame is transmitted during a single 
frame interval. For example, if a transmission unit consists of 300 bits and the current 
compressed frame size is 6000 bits, the token rate must be set to 20f tokens per second, 


where fis the frame rate. Since this scheme occurs downstream from the video coder, 
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rate control is not explicitly a part of the smoother. However, the benefit of rate control 


appears indirectly through the interaction of frame size with the rate controller. 
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Figure V.6: Cell Level Traffic Smoothing. 


Practical implementation of the smoothing scheme shown in Figure V.6 for 
layered video raises many additional issues. With layered video traffic, smoothing can be 
performed on a per-layer basis or on the entire video stream. Only the latter approach is 
examined here because of its simplicity. The next concern is how cells from each layer 
are interleaved for transmission. With FCFS scheduling, the order is unimportant but 
appears to play an important role in priority-based scheduling. A final concern is the 
identity of the transmission unit mentioned above. Smoothing must be implemented at 
some point prior to network entry, which in turn implies smoothing must be performed at 
the source node [93]. Examining the ATM protocol stack in Figure 11.2, the ATM layer 
marks the beginning of the network since the ATM layer includes network management 
functionality. Therefore, ATM cells are not a suitable transmission unit unless new 
functionality is added to the ATM layer. Instead, smoothing must be implemented above 
the ATM layer, either prior to the AAL or within the AAL. In either case, a suitable 
candidate for the transmission unit is a higher layer PDU, either an application PDU or an 
AAL-PDU. However, assuming that processing times within the lower layers are fixed, a 
suitable scheme can be devised that provides an effect equivalent to smoothing 


transmission of ATM cells within the ATM layer. 
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Ds Predictive Smoothing 


The primary drawback of the smoothing scheme identified in Figure V.6 is the 
added transmission delay. The delay consists of two components. First, since 
transmission is smoothed over an entire frame interval, a delay equal to the frame interval 
is inserted into the transmission path. For video-on-demand applications, the added delay 
poses no difficulty, but for interactive applications the delay becomes of greater concern 
as the frame rate decreases. For example, at 10 fps, a delay of at 100 ms is created. To 
meet the ITU-T standard of 150 ms for interactive applications [5], total transmission and 
queuing delay cannot exceed 50 ms. While stringent, this delay requirement appears 
feasible for the LOS wireless network considered here since transmission delays are 
small. The second delay component is due to the need to wait for the entire frame to be 
encoded before the token rate r 1s determined. 

While the buffering delay is set by the frame rate, the delay due to frame encoding 
may be reduced by instead predicting the size of the compressed frame [94]. The 
predicted value is used to set the token rate such that transmission units are transmitted 
immediately as they become available from the coder. Taking advantage of the 
correlated nature of the compressed video stream, the size of the current frame B(n) can 


be predicted from the sizes of the last P frames: 
F. 
B(n)= 9 a,Bin-k), (V.18) 
k=] 


where qa, are the filter weights. Several predictive techniques appear feasible for 
determining the weights in Eq. (V.18). Work by Randhawa and Hardy [95] on VBR 
video traffic streams found good results using the LMS algorithm. Another approach 
considered by the author [94] here determines the weights adaptively using the RLS 
algorithm [96] for the next frame during transmission of the current frame. The RLS 
algorithm offers the advantage of requiring less information about the input sequence 
than does LMS. Given that some error is present in the estimate, two scenarios are 
possible during each frame interval. If the prediction is high, the transmission buffer 


empties before the frame interval expires. However, with sufficient accuracy, the 
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benefits of smoothing cell delivery are still realized. If the prediction is too low, the 
buffer still has cells for transmission from the current frame, which are added to the size 
estimate for the next frame. To account for the learning period of the RLS algorithm, the 
predicted values are only employed once the prediction error has dropped below a 


threshold value related to the encoder delay. 


Using simulated VTC traffic, the RLS algorithm appears reasonably accurate in 
predicting compressed frame sizes. To validate the approach proposed above, the RLS 
algorithm was used to predict frame size for a VTC sequence generated from a modified 
version of the minisource model proposed by Sen et al. [88] described in Section V.A. 
They reported that fluid source modeling with 20 minisources per video source produced 
reasonable agreement with queuing simulations. Using the minisource parameters 
reported for VTC traffic in [27], Sen’s model was modified by replacing each minisource 
with a statistically equivalent first order autoregressive process to remove the affect of 
quantization. Figure V.7 shows the resulting video stream along with the predicted 
values using three taps (using more taps did not improve accuracy). Figure V.8 shows 
the corresponding prediction error. Assuming an encoder delay of 20-40 ms, predicting 
frame sizes in this manner saves effectively 15-35 ms of delay per frame. These results 
are only valid for low activity video, such as that found in VTC. Tests with video 
containing a large number of scene changes and/or a high degree of motion generated 


larger prediction errors. 
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Figure V.7: Simulated and Predicted VBR VTC Sequences Using RLS. 
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Figure V.8: Prediction Error for 3-Tap Filter Using RLS. 


3: Interleaving/Transmission Order 


The hierarchical nature of layered video facilitates the use of priority-based 
scheduling schemes within the network to ensure that service is granted in accordance 


with perceptual importance. Previous work by Luo and Zarki [16] indicates that when 
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transmitting priority-based traffic, performance is degraded by the degree to which cells 
from different priority classes are segmented together. Their results suggest that while 
smoothing the interarrival times of cells to the ATM multiplexer can increase queuing 
efficiency, the order in which cells are transmitted from each layer must also be 
considered to promote priority-based scheduling. 

The need to smooth cell traffic across layers 1s demonstrated by a simple example. 
Consider a layered video stream in which each layer has a distinct priority. Further 
assume that all cells for each layer from a GOB or a frame are concatenated prior to 
transmission. Each layer can be viewed as a separate cell flow as depicted in Figure V.9. 
As aresult, each cell flow now appears more bursty than the parent cell flow. If cell 
flows of similar priority from different connections become correlated in time, the result 
is higher cell loss in that priority class. This 1s analogous to the problem with correlation 
between bursty video streams originally addressed by Dixit [89]. If high-priority cell 
arrivals from different connections become correlated at the ATM multiplexer, the 
expected benefit derived from prioritization is denied since only one priority class is 
available for scheduling. Another viewpoint is that a connection js given only a finite 
number of service opportunities in a given time interval. Giving higher-priority cells 
precedence is only effective if those cells are available in the queue at the instant of 
scheduling. Concatenating priority levels, or in this case cells from different layers, 
creates time intervals wherein higher priority cells are not arriving into the queue and 
therefore creates periods where they are not available for service. Obviously, the impact 


is controlled to some extent by buffer size. 
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Figure V.9: Equivalent Cell Flows for Prioritized Traffic. 


We propose smoothing across layers by interleaving cells from the different 
layers. Interleaving maximizes the average distance between cells (in time) from a 
particular layer and provides the maximum smoothing of cell interarrival times for that 
layer. However, this in no way affects the interarrival times between adjacent cells in the 
connection, which are set solely in response to frame size. The degree of interleaving 
depends on the ratio of cells available from each layer. Let us consider three layers with 
the average bit allocation among layers of 4:2:2. Figure V.10 presents several possible 
interleaving arrangements ranging from complete segmentation to maximum . 


interleaving, where C# identifies an individual cell and # its parent layer. 
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Figure V.10: Several Possible Interleaving Schemes Given a 4:2:2 Ratio Among 


Layers. 
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Cel] interleaving is accomplished by modifying the leaky bucket mechanism 
shown in Figure V.6 as follows. A separate queue is maintained for each layer. The 
token rate is generated based on the accumulative size of each layer (which is just the 
frame size). Each time a token 1s available, an interleaver selects a cell from one of the 
buffers in accordance with the desired interleaving scheme. The modified smoothing 
technique is shown in Figure V.11. Practically, the bit allocation among layers varies 
over time; therefore, creating an a priori fixed interleaving scheme may be impractical. 
A reasonable approach, given the correlation between successive frames, is to assume 
that the bit allocation among layers for the last frame is indicative of the allocation in the 
current frame. Then the ratio of bits allocated to each layer can be used to create an 
interleaving pattern for the current frame. A simpler low-delay approach would impose a 


round-robin order on transmissions for those layers having cells available for 
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Figure V.11: Cell Smoothing and Cell Interleaving Over One Frame. 











Further consideration of the impact of cell interleaving on scheduling 
performance 1s deferred until the next chapter. In Chapter VI, after proposing a new 
scheduling algorithm for layered video, the effect of cell interleaving is demonstrated 


through traffic simulations by regulating the interleaving pattern within the traffic model. 
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4. Smoothing: Single VCI Case 


The discussion above addresses smoothing over three logical entities in the video 
stream: frames, layers, and cells. Smoothing frames to satisfy a bit constraint is 
accomplished at the application layer by a rate controller, such as the scheme described in 
Chapter IV. The remaining smoothing is accomplished by using either the leaky bucket 
smoother shown in Figure V.6 or the leaky bucket interleaver shown in Figure V.11. 
Placement of the smoother is not an arbitrary decision. As discussed earlier, all 
smoothing must be done prior to network entry. This leaves the option of implementing 
the leaky bucket mechanism either prior to the AAL or within the AAL. Each approach 
has merits, but implementation prior to the AAL negates the need for further 
modifications to the AAL. In either case, implementation carries the smoothing effect 
over to the ATM cell flow assuming fixed delays throughout the protocol stack. 

Working with the AAL2 scheme proposed for multiplexing a layered video 
source over a single VCC (see Section II.D.4), the first approach is to insert the leaky 
bucket mechanism at the application layer prior to the SSCS sublayer as shown in Figure 
V.12. The bit stream issuing from the coder for each layer is buffered at the smoother. 
As tokens become available, the smoother selects a transmission unit from one of the 
buffers in accordance with the interleaving scheme and forwards it to the appropriate 
AAL SAP. As discussed in Section II.D.4, a block size of 44 octets works well with the 
CPCS sublayer since, with overhead, this fits within exactly one ATM cell. Accordingly, 
an appropriate transmission unit is 44 octets for the smoother. Within the AAL, the 
SSCS sublayer merely hands the PDU over to the CPCS sublayer for encapsulation. 
Since the smoother multiplexes the flow of PDUs into the AAL, no explicit support for 
multiplexing is required within the CPCS sublayer. The only other requirement is that 
the coder must signal the smoother at the start of each new frame, so the token rate can be 


recalculated. 
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Figure V.12: Smoothing at the Application with a Single VCI. _ 


The second approach is to insert the leaky bucket within the CPCS sublayer prior 
to the ATM SAP as shown in Figure V.13. Here, CPCS-PDUs are buffered individually 


for each layer and handed to the ATM SAP as tokens become available in accordance 


with the interleaving scheme. Obviously, the appropriate transmission unit in this case is 


the CPCS-PDU. In this case, the smoother performs multiplexing within the CPCS 


sublayer. Once again, the video coder must signal the smoother after each frame to allow 


computation of a new token rate. 


Independent of whether smoothing is implemented prior to the AAL or within the 
SSCS sublayer, arbitrary cell interleaving precludes segmenting cells as shown in Figure 
II.13 to allow network nodes the option of identifying GOB boundaries. Arguably, since 


a GOB is some fraction of a frame, one-ninth of a frame in the coder presented in Chapter 
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IV, organizing cells into layer GOBs does provide some benefit of cell interleaving. If 
GOB boundaries are to be respected, the traffic smoother shown in Figure V.11 must be 
modified to buffer individual GOBs from each layer instead of a complete layer from 
each frame as originally proposed. Remaining GOBs are held at the application level 
until required. The interleaver services each buffer, sequentially starting with the lowest 
layer. As each token arrives, the interleaver draws an appropriate transmission unit from 
the buffer until the buffer is exhausted. After all buffers are exhausted in similar fashion, 


all buffers are refilled. 
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Figure V.13: Smoothing Within the AAL with a Single VCI. 


a Smoothing: Multiple VCI Case 


The multiple VCI case was covered in Section II.D.3 and differs from the 


previous case in that a separate VCI is used to transmit each layer, and AALS is 
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employed due to the lower overhead. Implementing smoothing again involves the same 
considerations presented in the last section except that buffering at the application level 
forces reconsideration of AALS for the AAL protocol. 

The first approach is to insert the leaky bucket mechanism at the application layer 
prior to the CS sublayer as shown in Figure V.14. The bit stream emrging from the coder 
for each layer is buffered at the smoother. As tokens become available, the smoother 
selects a transmission unit from one of the buffers in accordance with the interleaving 
scheme and forwards it to the appropriate AAL SAP. Here, using AALS reveals a 
distinct lack of efficiency. To realize the benefits of smoothing at the ATM level, 
processing between the smoother and the ATM layer should be ninimized. This is most 
simply accomplished by transmitting only that amount of data that will eventually fit 
within a single ATM cell. If AALS is used, this would limit the transmission unit to 40 
octets. In light of this, AAL2 offers a more efficient path since the size of the 
transmission unit can be increased to 44 octets. Therefore, the multiple VCI case 
assumes AAL2 as shown in Figure V.12. Within the AAL, further operation works as 
described in the single- VC] counterpart except that the CPCS-PDUs are not multiplexed 
within the CPCS sublayer and are instead transmitted to their respective ATM SAPs. 

The second approach is to insert the leaky bucket within the SAR sublayer prior 
to the ATM SAPs as shown in Figure V.15. Here, SAR-PDUs are buffered individually 
for each layer and handed to the ATM SAP as tokens become available in accordance 
with the interleaving scheme. Obviously, the appropriate transmission unit in this case is 
the SAR-PDU. Otherwise, operation proceeds as described for the single-VCI case 
except that released SAR-PDUs are assigned to the ATM SAP appropriate for that layer. 
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Figure V.14: Smoothing at the Application with Multiple VCIs. 
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Figure V.15: Smoothing Within the AAL with Multiple VCIs. 


6. Smoothing Results 


The bandwidth conservation produced by this smoothing may be evaluated by 
constructing a histogram model for the controlled and uncontrolled sequences shown in 
Figure IV.26. The estimated CLR for each stream is shown in Figure V.16 using a buffer 
size of 10 cells, which guarantees that queuing delay does not exceed 50 ms (the 
significance of this value is discussed below). CLRs were calculated using Shroff’s [15] 
ad hoc analysis technique. The result demonstrates that the rate-controlled stream 
requires far less bandwidth to guarantee a given QoS although the difference would 
lessen to some extent when mutliplexing multiple video streams. Results for three 


multiplexed homogenous sources are shown in Figure V.17 and reveal an even more 
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dramatic gulf between the rate-controlled and uncontrolled sources. While the histogram 
approach does incorporate frame-by-frame smoothing, the difference in queuing 
performance demonstrated here is attributable to the affect of the rate controller on the 


video stream’s histogram. 
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Figure V.16: Estimated CLR for Rate-controlled and Non-rate-controlled VTC 
Traffic. 
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Figure V.17: Estimated CLR for Rate-controlled and Non-rate-controlled VTC 


Traffic (3 sources). 


To illustrate the affect of smoothing at the cell interarrival level, an OPNET 
simulation involving three homogenous sources was created. Two cases were 


considered: individual frames were smoothed deterministically, that is cell interarrivals 
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were of fixed duration for each frame, and individual frames were smoothed in a Poisson 
fashion. In the latter case, cell interarrivals were distributed exponentially such that the 
average arrival rate equaled the size of the frame. While smoothing in Poisson fashion 
does not capture the higher bursty behavior described in Dixit’s work [89], smoothing in 
this fashion does reveal the affect of randomness of queuing efficiency. The results for 
several different traffic loads is shown 1n Figure V.18. In each case, deterministic 


smoothing yields better performance for a given load. 
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Figure V.18: Deterministic and Poisson Intraframe Smoothing. 


The final issue examined is the affect of cell interleaving or layer smoothing on 
scheduler efficiency. Again three homogenous sources were considered, but the 
multiplexer implemented a service discipline using the layered scheduling algorithm 
discussed in the next chapter. For a given traffic load of 0.8 and a bit ratio of 2:1:1 
among the three layers, queuing performance was examined for different levels of 
concatenation in the base layer. Specifically, run lengths of 2, 4, 6, and 8 were examined. 
The results are shown in Figure V.19. As the run length of cells from the highest priority 
layer are increased, the CLR is observed to rise. At the same time, the CLR from the 
lower priority layers decreases until the largest run length. The results indicate that 
minimizing the run length of the higher priority cells gives better performance as 


anticipated. Based on observations of the queuing behavior, the decrease in CLR results 
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from the scheduler having easier access to higher priority cell, which maximizes 


scheduling opportunities for those cells. 
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Figure V.19: Effect of Cell Concatenation of Scheduler Performance. 


This chapter examined the use of smoothing to increase multiplexer efficiency 
and thereby lower bandwidth requirements for multiplexed VBR traffic. Smoothing was 
considered at the three time scales: frames, layers, and cells. Rate control effectively 
smoothes at the frame level and 1s a part of the transmitting application. Smoothing 
across layers and cells requires insertion of a smoothing mechanism in the transmission 
path prior to network entry. A smoothing mechanism based on the leaky bucket 
algorithm was presented, and its implementation for layered video traffic was explored. 
A video traffic model for smoothed video traffic was also presented and is used in 


simulations presented in the next chapter. 
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VI. SCHEDULING LAYERED VIDEO TRAFFIC 


The final element in delivering layered video is designing a network scheduler 
that exploits the perceptual hierarchy inherent in layered video by prioritizing delivery to 
mitigate the affects of congestion. As shown in Figure VI.1, a network scheduler is 
implemented at each switch within the network and controls access to a network 
resource, namely the capacity of the outgoing line. The switch’s scheduling algorithm is 
responsible for sharing the line capacity amongst several customers, a difficult problem if 
each customer has different QoS requirements. The manner in which the scheduler is 
implemented determines the maximum number of customers that can be served within 
the available capacity. Therefore a tight relationship exists between the scheduling 


policy and the admissions policy. 
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Figure VI.1: A Switch Controlling Access to a Network. 








A scheduling policy consists of a queuing discipline and optionally a cell discard 
policy. Examining the queue in Figure VI.1, a scheduling policy determines how to 
queue cells awaiting service and the order in which to serve cells. The choice of 
scheduling policy directly impacts the ability to meet QoS 1n two ways. First, queued 
cells awaiting service experience delay due to the gap between their arrival time and 
service time. Second, queues are finite, and if the node experiences a cell burst, the 
queue may fill causing all further cell arrivals to be discarded. A larger queue has a 
smaller probability of experiencing cell loss but imposes potentially greater delays on 
arriving cells. The scheduling policy may be coupled with a cell discard policy, which 


helps guarantee QoS and responds to congestion. One example of a cell discard policy is 


to monitor queue length, and if the length exceeds a threshold, lower priority cells are 
discarded to prevent congestion. 

This chapter provides a scheduling mechanism that incorporates three criteria 
described above; a scheduler that guarantees QoS, performs optimal scheduling for 
different traffic classes, and prioritizes cell delivery from each layer as required. The 
discussion starts with a short survey of scheduling algorithms that incorporate QoS 
requirements into scheduling decisions. The discussion leads to the STEBR algorithm 
proposed by Uziel [39]. STEBR is an optimal scheduling algorithm for heterogeneous 
traffic and is used as the basis for designing a scheduler for layered traffic. STEBR is 
modified for layered traffic by incorporating the notion of priority within a connection. 
Prioritization is brought in through a filtering mechanism that subordinates the QoS 
granted to lower priority layers to that received by higher priority layers. A partial GOB 
discard scheme is presented that drops unusable cells for increased effective bandwidth 
utilization. Finally, OPNET simulations are presented to verify the vale of the 


proposed schemes. 
A. SCHEDULING CRITERIA 


The specific problem examined here is to determine a scheduler design for a 
layered video traffic stream that meets several criteria. The simplest criterion is that the 
scheduler should meet the QoS requirements for the video stream. Although the layered 
coder is designed for robustness, limits on the cell loss rate help deliver a less distracting 
viewing experience by limiting fluctuations in reconstructed quality. Since VTC is an 
interactive application, limits on scheduling delay are also required. Since QoS 
guarantees are desired, a scheduling policy, such as first come, first serve (FCFS), is 
clearly impractical. With FCEFS service, arriving cells are handled merely by servicing 
the cell at the head of queue. The requirement here is a scheduling policy that integrates 
each connection’s QoS requirements into scheduling decisions. 

Given a low-bit-rate networking environment, another criterion 1s that the 


scheduling policy maximize utilization of resources. Maximum utilization may be 
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viewed as maximizing throughput or, alternatively, supporting the largest number of 
customers, each with their own unique QoS constraints. Given two scheduling policies, 
the more optimal policy is the one that admits the largest number of connections. To 
demonstrate the effect of scheduling policy, consider a network with two services classes, 
each representing a fixed set of QoS parameters. The admissible region is two- 
dimensional, representing all allowable combinations of connections from each service 
class. As described above, a FCFS scheduling policy does not inherently take QoS into 
account and, therefore, gives the smallest admissible region. An optimal scheduling 
policy gives the largest admissible region. This situation is shown in Figure VI.2 for the 
two service classes. As the number of connections for either service class goes to zero, 
both scheduling strategies give the same performance. In general, if k service classes are 
defined, the resulting admissible region is k-dimensional. Finding and verifying an 


optimal scheduling strategy is difficult in this case. 
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Figure VI.2: Admission Regions for FCFS and an Optimal Scheduling Policy. 


The final criterion required of the scheduler is to exploit the hierarchical nature of 


the layered video stream in choosing which cells to service and which to deny service. 
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This goes beyond the notion of guaranteeing QoS to aconnection. Any time the 
scheduler has insufficient bandwidth to transmit all waiting cells, perhaps due to a traffic 
burst or congestion, cell loss is inevitable. The problem is devising an intelligent 
mechanism for deciding which cells in a connection to service, or alternatively, which 
cells to deny service. With a layered video stream, the relative perceptual importance of 
each layer imposes an inherent transmission priority. Any loss in the base layer is 
catastrophic: some portion of the picture cannot be reconstructed. Losses from the 
enhancement layers only degrade reconstructed quality. Therefore, an intelligent service 
policy is to favor transmitting cells from the higher priority layers at the expense of those 
from lower priority layers as required. This points to a hierarchical service policy in 
which the layered video connection is assigned a certain QoS, 1.e., a certain amount of 
bandwidth is allocated for all layers to share. However, during periods when the QoS 
cannot be maintained, a transmission priority is enforced that allocates bandwidth to the 
more perceptually important layers. In this manner, congestion causes the reconstructed 
video to degrade gracefully but remain viewable. 

Another facet of the layered stream to consider is the hierarchy placed on the 
organization of the bit stream within each layer. In particular, a decoder can only 
resynchronize after cell loss at select points within the bit stream. Therefore, a single cell 
loss may render subsequent cells unusable to the decoder. Since these cells could 
otherwise hinder transmission of other viable cells, a suitable cell discard scheme that 
discards unusable cells regardless of QoS constraints could increase effective utilization 


of the outgoing link. 
B. QOS SCHEDULING ALGORITHMS 


Referring to Figure VI.3, scheduling algorithms allocate bandwidth amongst 
different traffic sources according to the QoS required by each source. As indicated 
above, the FCFS policy is the simplest scheduling policy but treats all service classes 
equally and is not suitable for an integrated services network. This section describes 


several classes of scheduling algorithms, starting with an overview of early efforts and 
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ending with two novel methods proposed by Uziel [39]. Uziel also provides a more 


comprehensive review of scheduling algorithms. 
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Figure VI.3: Scheduling Different Priority or QoS Classes. 


i Reservation Schemes 


Static-priority-scheme (SPS) algorithms differentiate between differing QoS 
requirements through priority assignments. Each traffic source is assigned a priority, and 
each cell is tagged with the appropriate priority. Cells are served in priority order while 
‘cells with the same priority are served using a FCFS policy. This policy may be 
envisioned by replacing the single queue in 

with a queue for each priority level as shown in Figure VI.3. Higher priority 
queues are served until emptied, and cells within each queue are served FCFS. SPS 
algorithms are simple to implement and provide flexibility in serving different traffic 
classes but provide poor performance in certain situations. For example, if high priority 
cells have higher delay (maxCTD) requirements while lower priority cells have stricter 
maxCTD, the low priority cells will receive poor service and face potentially high loss 
rates. 

A related approach is bandwidth reservation in which traffic sources are 
guaranteed a bandwidth allocation in proportion to a traffic statistic or QoS requirement. 
Bandwidth may be allocated among x traffic sources by simply dividing the capacity 


evenly, apportioning bandwidth according to mean bit rate or by a weighted combination 
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of the mean and variance of the bit rate [97]. However, each of these approaches fails to 
account for QoS requirements and offers only marginal performance over FCFS. A | 
better approach is to assign bandwidth as a function of required QoS [98]. For example, 
if each of the traffic sources has a maxCTD of T;, then each source receives a guaranteed 


bandwidth of 


BW, = 1 we ke. (VL.1) 
j= J 


where n customers compete for service, and C, is the line capacity. The drawback to 
static allocation and bandwidth reservation 1s that both may leave the server underutilized 


with bursty traffic since spare capacity cannot be reallocated among sources. 
a STE and BCLPR 


The shortest time to extinction (STE) algorithm proposed by Panwar et al. [99] 
handles traffic sources with deadlines (maxCTDs). The goal is to maximize the fraction 
of cells entering service prior to their respective deadlines or, equivalently, to minimize 
cell loss due to expiration for G/D/1 queues. Each cell entering the queue is assigned a 
deadline or time of expiration (ToE) that diminishes the oneen the cell waits in the 
queue. Service periods are divided into slots; each slot represents the time required to 
service one cell. At the beginning of every service slot, the ToE is updated for each cell. 
Cells missing their deadline to start service, indicated by a ToE less than the service slot, 
are dropped from the queue. Of the remaining cells, the cell with the lowest ToE is 
serviced. STE is optimal with respect to cell loss rate and is simple to implement. 
However, STE is not optimal for heterogeneous traffic streams, where each stream may 
have different maximum CLRs. 

Uziel has proposed a new scheduling algorithm, the balanced-CLP-ratio (BCLPR) 
algorithm, that improves upon STE for heterogeneous traffic [39]. BCLPR makes 
scheduling decisions based on each connection’s CLR requirement along with STE’s 
approach of dropping cells that are unable to meet their service deadline. BCLPR 


calculates two statistics for each connection: an instantaneous CLP (ICLP) 


166 


DS i | _ Cellsdiscarded fromconnection1 








ICLP\i] SS oo, (VI.2) 
Ali] Total cells arrived from connection 1 
and a cell-loss probability ratio (CLPR) 
CnpRy a i] (VL3) 
ACLP\i] 


which compares the instantaneous CLP with the allowable CLP (the CLR quality of 
service (QoS) constraint). The algorithm employs the following steps at the beginning of 
each service interval. The ToE of each cell is calculated, and expired cells are dropped 
from the queue. The ICLP and CLPR statistics for each active connection are updated, 
and the first cell in the queue from the connection with largest CLPR is selected for 
service. If two or more connections have the same CLPR, a cell is selected at random 
from one of the connections. 

Over time, BCLPR ensures that each connection 1s granted at least its guaranteed 
QoS. If aconnection’s CLPR exceeds one, the connection is getting less than its 
guaranteed QoS, and the connection has a greater chance of receiving a service slot from 
the scheduler. A connection with a CLPR less than one is getting better QoS than 
guaranteed, so it will receive correspondingly less service from the scheduler. Over time, 
the average CLPR for each source approaches the same value. The proximity of the 
value to one depends on the scheduler loading. 

Summarizing, STE is optimal with respect to cell loss rate when considering 
homogeneous traffic. BCLPR employs cell loss rates in scheduling decision, which 
improves performance with heterogeneous traffic. However, BCLPR does not employ 
the proximity of a cell to expiration in scheduling decisions. This leads to poor 
performance for bursty traffic wherein the scheduler may choose to service a connection 
ignoring a burst of cells from another connection nearing expiration in the queue. In fact 
an oscillation can arise such that a connection only receives service following the loss of 


a cell burst, which degrades system throughput [93]. 
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3: STEBR 


The STE with BCLPR (STEBR) scheme proposed by Uziel [39] corrects this 
behavior by considering both the instantaneous loss rates experienced by each source, the 
deadlines of cells within the queues, and the expected losses if service is denied given 
that there are no further arrivals. STEBR makes optimal scheduling decisions in the 
sense that no other algorithm for a single-queue single server system has a larger 
admissible region. STEBR employs a predictive cost function associated with each 
connection’s current CLPR that increases with the number of cells discarded. 

Each cell in the queue is assigned a cost representing the future impact of service 


denial on the parent connection’s CLPR. The cost for connection 7 is represented as: 


Cost[i] = CLPR{i fansdaional = he = CLPR\i}+ (Ali |x ACLP[i])", (VL4) 
where DS[i] is the number of cells discarded from connection 1, A[i] is the number of 
arrivals from connection 7, ACLP[i] is the desirable CLP for connection i, and CLPR{[i] 
indicates how well the connection is being serviced. The cell closest to expiration is 


assigned this cost. Working towards the back of the queue, newer cells are assigned an 


incrementally greater cost in a linear fashion, where the increment is: 
A, = (Alix ACLPIiy". (VLS) 

Since any scheduling decision made for the current service slot may lead to expiration of 
other cells expiring due to denial of service, STEBR schedules the cell that minimizes the 
overall system cost for all connections using the above cost function. 

At the start of each service slot, the queue is scanned, and cells that have expired 
are dropped. Next, the CLPR for each connection is updated, and each cell is assigned a 
cost using using Eq. (VI.4) and Eq. (VI.5). STEBR then partitions the waiting cells by 
assigning each cell to the most future service slot in which the cell could receive service 
and still avoid expiration (a value of | indicates that the cell will expire if not granted 
service). The service slot for the jth cell 1s calculated as 


T, = ee ul =|ToELj]xc, |, (V1.6) 


YC, 
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where C, is the channel capacity in cells/second. Each service slot may have multiple 
cells from one or more connections. Working from the most future service slot to the 
current time slot, the algorithm assigns exactly one cell to each service slot by examining 
the cost of each cell. If a connection has multiple cells in the service slot, only the one 
with the maximum cost is considered. The service slot is awarded to the connection with 
the highest cost cell. Cells that are not selected for service are moved to the next service 
slot, and the procedure is repeated. This action recognizes that while a cell not selected 
for service in slot k will expire if deferred to a later slot, service in slots 1 tok - 1 is still 
feasible. The process is repeated until slot 1 is reached, which represents the current 
service slot. The cell awarded service in slot | is actually transmitted. Note that any 
cells originally assigned to slot | not selected for transmission will be discarded during 


the next service interval. Uziel [39] provides several examples that illustrate this process. 


C. LAYERED SCHEDULING 


1. QoS Filtering and the STEBR Algorithm 


The strategy for a layered video connection 1s to have the switch first maintain the 
contracted QoS for the connection overall and then maintain a specified QoS for each 
layer within the connection. Emphasizing the connection’s QoS ensures that the 
connection receives fair access to the bandwidth originally granted by the network. The 
QoS received by each layer within the connection is subordinate to the desire to preserve 
QoS for the higher priority layers during periods of congestion. That is, we choose to 
reallocate bandwidth dynamically within a connection to maintain QoS for the higher 
priority layers. Two schemes are examined here to reallocate bandwidth dynamically. 
The first scheme is to selectively employ prioritization in a hierarchical fashion: cells 
from lower priority layers are denied service only when higher priority layers are not 
receiving their guaranteed QoS and have cells awaiting service. Selective prioritization 
allows explicit QoS guarantees for each layer that are then relaxed for the lower priority 
layers during periods of congestion. As shown later, this approach offers efficient 


utilization of the outgoing link. However, the hierarchical dependence within the video 
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stream makes the transmission of base layer cells imperative. Since the lower priority 
layers scale quality using information in the base layer, a loss in the base layer makes 
related information in the lower priority layers unusable. This point is covered in more 
detail in the next Section. In recognition of this dependence, a second scheme only 
transmits cells from low priority layers whenever no cells from high priority layers are 
awaiting service. This scheme maximizes throughput of the base layer but ignores QoS 
guarantees for individual layers. Dropping the low priority layers also forfeits any 
opportunity to employ these layers in error concealment schemes. 

Selective prioritization 1s accomplished by an extending the STEBR algorithm 
discussed in the last section. The linear STEBR algorithm was chosen since it makes 
optimal scheduling decisions with respect to heterogeneous CBR and VBR traffic, and it 
imposes no service penalty on homogeneous traffic. Implementing the linear STEBR 
algorithm is also computationally efficient since complexity scales linearly with queue 
length. The extension described here is posed only with respect to scheduling of 
homogenous, layered traffic although the algorithm readily extends to servicing 
heterogeneous non-layered traffic. 

Recall that the STEBR algorithm uses the CLPR as a cost function for granting 
service. In particular, each cell is tagged with a speculative cost that represents the 
increase in CLPR if service is denied to that cell. Then, the algorithm divides the queue 
into service intervals and schedules cells from the back of the queue forward. The cell 
with the greatest cost is assigned the current scheduling slot; all other cells move down to 
compete for the next scheduling slot until the first slot is reached. The cell winning the 
first slot is actually granted service. 

To modify the STEBR algorithm for layered traffic, the CLPR for individual 
layers as well as their parent connections must be maintained. As each cell arrives to the 
queue, its connection j and layer number k are determined, and the appropriate arrival 
counts are updated for that connection A[j] and that layer AL[j,k]. At the beginning of 
each scheduling interval, the queue is sorted in terms of decreasing ToE. Starting from 


the head of the queue, each expired cell 1s dropped if its ToE is less than the service time. 
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If a cell is denied service, the dropped cell counts are updated for the affected connection 
DS{j] and the layer DSL[j,k]. After the queue is scanned, the current CLPR for each 
connection and each individual layer is updated. Each connection’s CLPR is calculated 
using Eq. (VIJ.3), and the CLPR for each layer is calculated in an analogous manner using 


DS{ j,k] 


Coa) |= 
A{j,kJACLP{ J] 


(VI.7) 


where ACLP{[j] is the allowable cell loss for connection j. Equation (VI.7) assumes that 
each layer is assigned the same Qos as the connection. However, Eq. (V1.7) could easily 
be modified to apply a different QoS to each layer. 

The STEBR algorithm is incorporated into layered scheduling in the following 
manner. At the beginning of each scheduling interval, STEBR 1s applied to determine 
which connection receives service based on the QoS granted to the connection so far and 
the associated cost function. Layering does not explicitly play a role in determining the 
connection that receives service. After a connection is granted access to the current time 
slot, the next decision is to determine the layer within the connection that receives 
service. The procedure is to compete for service based on each layer’s current CLPR. 
First, since a layer without cells present in the queue does not need to compete for 
service, the CLPRs for these layers in the winning connection are zeroed out. Second, 
remaining layers with non-zero CLPRs are filtered to prioritize transmission consistent 
with the perceptual importance of each layer. The two schemes mentioned above are 
implemented using the filtering algorithms shown in Figure VI.4. With bandwidth 
sharing, the intent is to give a higher priority to the more perceptually important layers 
only when those layers are not receiving their desired QoS. Otherwise, all layers are 
treated in an equal manner. This QoS-based prioritization is implemented by zeroing out 
the CLPR of lower priority layers whenever a higher priority layer is not receiving the 
requisite QoS as indicated by a CLPR of greater than one. With priority sharing, a lower 
priority cell only receives service if no higher priority cells are available for service 
within the queue. This is accomplished by zeroing out the CLPR of lower priority layers 


whenever the CLPR of higher priority cells is non-zero, which indicates that those layers 
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have cells available for service. The filtering algorithms shown in Figure VI.4 assume 
that each connection has three layers. 
if (CLPRL[j,0] > 1){ 


CostL{j, 1] = 0; 
COStULy ZT =U: 


if (CLPRL[j,0] > 0){ 
CostL{j,1] = 0; 
Cosi. Z\ 0: 


} 
else if (CLPRL(j, \ieasO)g 
CostL{j,2] = 0; 


} 
else if (CLPRL[j,1] > 1){ 
CostL{j,2] = 0; 


} 





Figure VI.4: Cost Filtering per Layer for a) Bandwidth Sharing and b) Priority 
Sharing. 


After filtering, the slot is assigned to the cell from the layer with the highest 
CLPR. At this point, two options were explored. Earlier work with the BCLPR 
algorithm indicated that selecting cells deep within a queue has a deleterious effect on 
throughput [93]. Selecting cells without regard to queue position may lead to the 
situation in which cells on the verge of expiration are ignored to service a cell from a 
connection with a higher cost even though that cell is in no immediate danger of 
expiration. The STEBR algorithm [39] corrects this by comparing the cost of denial of 
service for each connection on a global basis. However, the filtering algorithms re- 
introduce this problem to a certain extent by bypassing cells from a lower priority layer to 
service cells from higher priority layers as needed. Arguably, this 1s intentional since 
without the higher priority layers the lower priority layers produce no benefit to the 
receiver, and a lower throughput is acceptable to ensure that the appropriate cells are 
delivered. The tradeoff between throughput and priority service is examined by 
implementing service deferral. The ToE of the cell selected for service during the current 
time slot is examined. If the ToE indicates that the cell is not due to expire during the 
next time slot, service is deferred and the cell closest to expiration from that connection is 
selected for service instead. Service deferral therefore reverts back to STE [99] within a 


connection whenever possible. 


TZ 


A complete summary of the algorithm is given in Figure VI.5. An OPNET model 


that implements STEBR for layered video traffic is given in Appendix A. 


1. Sort the queue in order of increasing ToE from the head of the queue. 
2. Scan the queue from head to tail. For each cell: 
a. Calculate the cell’s ToE. 
b. If the ToE is less than the service interval: 
i.. Discard the cell. 
li. Increment DS{j] and DSL]. 
3. Update CLPR, CLPRL, and A for each connection and layer using Eq. (VI.1) through 
(VI.4). 
Assign a connection cost to each cell using Eq. (VI.5) and Eq. (VI.6). 
Assign each cell to a tentative time slot n= | ToE me, \ 


Assume that after step 5, N time slots are allocated. 
For each time slot n from N down to 1: 
a. For every cell / in that time slot from connection /, layer k: 
1. If Cost] = 0: 
1. Increment Extra_Cells[j]. 
i Else: 
1. Set Costly] = Cell_Cost[z]. 
b. Find the largest Cost[i]. Assume the connection is j,. 
c. If Cost[1] > 0, there is at least one cell awaiting service. 
Pie cram cllS|i. 0) 
1. Set Cost[j,] =-1. 
li. Else: 
1. Decrement Extra_Cells[j,]. 
2. Reduce Cost[j,] by Aj. 
8. Connection j, is assigned the time slot. 
a. For each layer k with no cells enqueued, set CLPRL[),,k] = 0. 
b. Filter the cost for each layer using Figure VI.4. 
c. Assume winning layer is k,: 
i. With service deferral: 
[eee ete | ToE rel one |> 2 for the selected cell: 
a. Service the cell from j, with the lowest ToE. 
Dee Se. 
a. Service the first cell from layer k. 
ii. Otherwise service the first cell from layer k. 


ne eae 


Figure VI.5: Modified STEBR Algorithm. 
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a GOB Dropping 


As discussed in Chapter II, correct decoding of the compressed video bit stream 
requires that the decoder stay in sync with the bit stream. Bit errors and dropped cells 
interrupt the decoding process and force the decoder to scan the bit stream until a 
distinctive codeword is found to reset the decoding process. This is part of the rationale 
for imposing a logical hierarchy on the bit stream (see Figure HI.1). For the coder 
proposed here, the information required to start the decoding process includes the start of 
the next macroblock, the scene type, and the current quantizer setting. Other coders 
might require additional or different information’*. Since repeating this information 
consumes bandwidth, a tradeoff is forced between minimizing this overhead and the 
distance, in macroblocks, between resynchronization points. Most coders, therefore, 
support resynchronization at the start of each GOB". The result is that, after a stream 
error, the decoder parses through the bit stream until a GOB header is recognized and 
restarts decoding at that point. The intervening data between the stream error and the 
GOB header is discarded, and the effect on the display is left up to the decoder. 

The effect of dropped cells on the decoder has strong implications for the layered 
scheduling algorithm proposed in the last section. As shown in Figure VI.6, a cell 
dropped from within a GOB corrupts the GOB. Any cells remaining in the GOB are 
unusable since their information payload will ultimately be discarded at the decoder. In 
this case, making scheduling decisions based on CLPR is suboptimal since CLPR no 
longer represents a valid indication of the impact of denying service on reconstructed 
visual quality at the recipient. Indeed, dropping the remaining cells in the corrupt GOB 
does not further degrade the quality of the reconstructed frame beyond that imposed by 
the original cell drop. However, the effect of dropping the unusable cells is not merely 
neutral. Removing these cells from contention increases the number of scheduling 


Opportunities to cells that still have the potential to be successfully decoded. Therefore, 


'S A MPEG decoder would need the frame type (I, P, or B) for example [6]. 


'° 7.263 has a low bit rate mode that eschews GOB headers and resynchronizes only at frame headers [56]. 
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in a global sense, the overall effect on the reconstructed quality of all competing 
connections 1S positive especially if the released scheduling opportunities are biased 
toward the higher priority layers in each connection. Since the layered STEBR algorithm 


filters costs by layer, this 1s the expected outcome. 


Cell Denied Service 


\ Unusable Cells 





ATM Cell Flow 


ee 


GOB 


Figure VI.6: The Effect of Cell Discard on a GOB. 


To illustrate these points, consider the example shown in Figure VI.7. A 
scheduling slot k contains a layer 2 cell from connection A and layer O cells from 
connections B and C, respectively, with the connection costs shown. The layered STEBR 
mechanism grants the slot to that connection with the greatest overall cost and then filters 
by layer. Here, connection A would be granted the slot. Now, assume that the layer 2 
cell belongs to a broken GOB. Granting service to A will not improve the recipient’s 
quality, and denying service to connections B and C potentially corrupts two additional 
GOBs. Denying service to A, while appearing to degrade QoS to the connection, actually 


provides a global benefit since connection B receives an additional scheduling 


opportunity. 
NG B-O C-0 
0.9] 0.85 0.83 
T, 


Figure VI.7: Competition Between Usable and Unusable Cells. 
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An analogous situation occurs with UBR connections carrying IP datagrams. If a 
cell from a datagram is discarded, the remaining cells are unusable. If the IP datagrams 
belong to a TCP connection, a single dropped cell forces the entire TCP segment to be 
retransmitted. As retransmissions reduce effective throughput, techniques such as partial 
packet discard respond to a dropped cell by dropping the remaining cells in the datagram. 
By removing these unusable cells from contention for scheduling, effective throughput is 
increased [18]. 

Based on the discussion above, we present a modification to the layered STEBR 
scheduling algorithm that implements partial GOB dropping. Partial GOB dropping 
consists of removing any cells remaining in a GOB following a dropped cell in that GOB. 
A similar approach proposed for high bandwidth MPEG-2 video traffic by Kuo and Ko 
[100] schedules slices for transmission only if sufficient bandwidth 1s available to 
transmit an entire slice without loss. The approach here 1s less stringent since scheduling 
assignments are made based on current queue occupancy, delay considerations do not 
allow determination of GOB length in real-time for low bit rate video traffic, and some 
partial benefit is derived by transmitting at least the beginning of the GOB. 

Since the video stream 1s layered, partial GOB dropping must take into account 
both dropped cells within each GOB plus the impact of GOB corruption in one layer on 
related GOBs within other layers. Obviously, the greatest impact occurs when a GOB 
from the base layer is corrupted. In that case, at least part of the information carried 
within the associated GOBs of lower priority layers is also rendered unusable. 
Corruption of a lower priority GOB does not appear to have the same consequence. 
Based on subjective and quantitative evaluations using the coder from Chapter IV, a 
tangible benefit is obtained by decoding and applying a lower priority enhancement 
regardless of whether higher priority enhancement layers are successfully decoded. 

Based on these observations, partial GOB dropping is implemented in the 
following manner. If acell is discarded from a base layer GOB, all remaining cells in 
that GOB and all remaining cells in associated /ower priority layer GOBs are discarded. 


If a cell is dropped from within an enhancement layer GOB, all remaining cells in that 
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layer’s GOB are discarded. The impact of a cell discard in each situation 1s illustrated in 


Figure V1.8. 


Cell ai Service 


Unusable Cells 





Cell Denied Service 


| Unusable 
Cell 





Figure VI.8: Discard Policy Following a Cell Loss from: a) Base Layer GOB or b) 
Enhancement Layer GOB. 


The base layer discard policy 1s actually somewhat severe since a cell loss from a 
base layer GOB does not always invalidate information in enhancement layer GOBs. 
Technically a loss from the base layer GOB only invalidates information in enhancement 
layers starting at the same spatial position, 1.e., a macroblock, for decoding purposes. 
Any information prior to this point 1s still usable although coordinating the spatial 
relationship of cells in different layers is not a trivial task. One possible approach is to 
interleave cells from different layers in a manner that approximates the correct spatial 
dependence such that when a cell from the base layer is dropped, loss of usable 
information is minimized when dropping the remaining cells in the base and 


enhancement layers. This approach 1s shown in Figure VI.9, where cells from the layer 
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GOBs have been interleaved due to spatial dependence. Now a loss of a base layer cell 
results in smaller number of cell discards compared to Figure VI.8. With the current 
coder, this is not an issue since at low bit rates the enhancement layers are usually 


restricted to a single ATM cell in length. 


Cell “y Service 


Unusable Cells 





ATM Cell Flow 


Figure VI.9: Interleaving Layers Cells to Minimize Information Loss. 


GOB dropping is implemented in the following manner. For each connection, a 
flag is maintained for each layer, 1.e., 3 flags per video source. The flag indicates the 
state of the current GOB in each layer, “RETAIN’ or ‘DROP’, and indicates whether the 
remaining cells in that GOB should be retained or dropped. Assuming that the current 
GOB has remained intact so far, a cell dropped due to expiration triggers a change in 
status from RETAIN to DROP. If the expired cell belongs to the base layer, the flags for 
the associated lower priority layer GOBs are also set to DROP. Each layer’s flag is reset 
to RETAIN at the start of anew GOB as indicated by either the SDU bit or a change in 
cel] tags (see Figure II.11 and Figure [1.13). 

At the start of each scheduling slot, the queue 1s scanned from head to tail as 
previously described. The scheduler performs different actions for each cell depending 
on the status of its parent GOB. If the GOB status is RETAIN, the cell’s ToE is 
calculated. If the cell has expired, the cell is dropped, and the GOB’s status is changed to 
DROP. Again, if the cell belonged to the base layer, the enhancement layers are set in a 
similar manner. If the GOB status is DROP, the cell is examined to determine if the cell 
contains a GOB header, which indicates the start of anew GOB. If it does and the cell 
has not expired, the GOB status 1s toggled back to GOOD. Otherwise, the cell is dropped 


regardless of its ToE. This algorithm 1s summarized in Figure VI.10. 
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1. Scan the queue from head to tail. 
2. For each cell from connection /, layer /: 


A. If status{zj] = RETAIN: 
a. Calculate ToE. 
Dl Por = service time: 
1. Status{7j] = DROP. 
ii. Discard cell. 
hein: 
fe SealuSin | = DOr Vk = UO 


B. If status{i] = DROP: 
a. Check for GOB header. 
b. If new GOB header: 
1. Calculate ToE. 
lel Leb Servicetime: 
1. Discard cell. 
ui. Else: 
1. Status[7y] = RETAIN. 
ce. Vilse: 
1. Discard cell. 





Figure VI.10: Partial GOB Dropping Algorithm. 


D. RESULTS 


Performance of the layered STEBR algorithm was validated using OPNET. The 
scenario simulated was a network configured as shown in Figure VI.11 with three layered 
video sources. Each layered source transmits at a mean bit rate of 80 kbps and is 
represented within the simulation using the MMRP traffic model discussed in Chapter V. 
An OPNET model for a layered video source is given in Appendix A, and the model 
parameters are given in Appendix B. The bit allocation among the layers was set at 
2:1:1. The requested QoS for each connection consists of amaxCTD of 50 ms and a CLP 
of 10°. Each layer is assigned the same CLR. While the CLR is high for video traffic, 
the value chosen shortens simulation time while still giving a valid demonstration of the 
algorithm’s behavior under different loads. Since the performance of the STEBR 
algorithm with heterogeneous traffic has been presented thoroughly elsewhere [39], only 


the homogenous traffic case is considered here. 


le 


Layered Source | — Queue 


Layered Source 2 > 


Layered Source 3 a Ce 








| Network access link 


Figure VI.11: Network Scenario. 


The first issue considered was the ability of the QoS filtering algorithms listed in 
Figure VI.4 to shift bandwidth to the higher priority base layer as network load was 
increased to simulate congestion and the corresponding impact on connection throughput. 
The first filtering approach considered was bandwidth sharing. 

The premise of service deferral is sustaining the maximum possible throughput by 
deferring service of a selected cell provided that cell will not expire if not granted 
immediate service. Figure VI.12 shows the impact of service deferral on the CLR for 
each layer as network load is increased. As long as the base layer is receiving its required 
QoS, all layers are treated in approximately the same manner. As network load increases 
and connections experience CLRs exceeding the required CLR of 10°, the scheduler 
adapts by denying service to the higher layers whenever possible. However, with service 
deferral, cells from lower priority layers are still granted service unless a higher priority 
cell is present and about to expire. The result is that, while the scheduler violates QoS for 
the base layer last, QoS cannot be maintained over a wide range. Consequently, the gap 
in CLR between the base and enhancement layers stays relatively constant at one order of 


magnitude, and QoS between the enhancement layers 1s not differentiated at all. 
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Figure VI.12: CLR for Bandwidth Sharing and Service Deferrals. 


The same scenario without service deferral is shown in Figure VI.13. Now a 
higher priority cell receives priority service if the layer is not receiving its requisite QoS 
regardless of the cell’s position within the queue. The result is that as network load is 
increased, the required QoS for the base layer is maintained regardless of network 
loading, and a clear delineation exits in treatment of the enhancement layers. Comparing 
Figure VI.13 with Figure VI.12, the bandwidth required to maintain QoS for the base 
layer comes primarily from denying service to layer 2 cells, the second enhancement 


layer as desired. 
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Figure VI.13: CLR for Bandwidth Sharing and No Service Deferrals. 


The performance of priority sharing was also considered with and without service 
deferral. With service deferral, the result is identical to Figure VI.12. Service deferral 


renders the cost function irrelevant since the cost function is effectively applied only if 
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the chosen cell is about to expire. Without service deferral, the impact of priority sharing 
is shown in Figure VI.14. Since the base layer receives priority any time a cell is present, 
the scheduler actually prevents any observable cell loss in the base layer for the network 
loads examined. Once again, the bandwidth required comes at the expense of the second 
enhancement layer as desired. However, the first enhancement layer receives the best 


service out of all three scenarios. 
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Figure VJ.14: CLR for Priority Sharing and No Service Deferrals. 


Comparing Figures VI.12 through VI.14, priority sharing without service 
deferrals gives the best performance with respect maintaining or exceeding the QoS for 
the layers in the hierarchical order desired. However, the QoS of the base layer cannot be 
arbitrarily controlled without impacting the connection’s throughput. The throughput for 
each of the scenarios above is shown in Figure VI.15 and indicates that closer regulation 
of the CLR comes at the price of decreasing throughput for that connection. Given these 
results, the priority algorithm was deemed unsuitable. Since some loss can be tolerated in 
the base layer, as indicated by the QoS parameters supplied as part of the traffic contract, 
the priority sharing algorithm appears unsuitable. The remaining discussion covers only 


the bandwidth sharing algorithm. 
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Figure VI.15: Throughput under Each Scheduling Scheme. 


The next issue considered is the effect of partial GOB dropping for each of the 
two remaining scenarios. With GOB dropping, throughput is expected to decrease since 
at least part of the traffic allowed through will be unusable at the decoder. The results for 
bandwidth sharing and service deferral are shown in Figure VI.16. Compared to Figure 
VI.12, better performance is delivered in terms of CLR for each layer although the 
difference grows successively smaller with increasingly higher network loads. The 
improvement is the most notable with the second enhancement layer. Also a marked 
differentiation in QoS is observed for both of the enhancement layers that did not exist 


prior to GOB dropping. 
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Figure VI.16: CLR for Bandwidth Sharing, Service Deferrals, and GOB Dropping. 
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The effect of GOB dropping without service deferrals is shown in Figure VI.17. 
The scheduler is still able to maintain the requisite CLR for the base layer. The effect on 
the enhancement layers is mixed. Control over the CLR for the first enhancement layer is 
improved relative to Figure VI.13 at network loads below 0.8. Above this point, CLR 
increases. The CLR for the second enhancement layer is higher regardless of the network 
load. The greater loss, however, results in the improved CLR observed for the first 
enhancement layer at lower network loads. At higher network loads, the impact of cell 
drops from the base layer dominates. Since a cell dropped from a base layer GOB causes 
the first and second layer’s GOBs to be discarded, the CLRs for first and second 


enhancement layers start to converge. 
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Figure VI.17: CLR for Bandwidth Sharing, No Service Deferrals, and GOB 
Dropping. 


The impact of GOB dropping on throughput, with and without service deferral, is 
shown in Figure VI.18. Remarkably, in both cases, only a small decrease is observed in 
throughput and then only at high network loads. However, service deferrals still result in 
higher throughput overall. 

Considering the joint effects of service deferral and GOB dropping on layered 
scheduling, the scheduler is able to more aggressively utilize bandwidth released by 
dropping non-viable cells to improve service for all layers. However, service deferral is 
unable to maintain the requisite CLR for the base layer at high network loads with or 


without GOB dropping. Without service deferral, throughput is impacted since the 
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scheduler gives greater priority to winning cells, which tend to be base layer cells at the 
higher loads. GOB dropping does allow the scheduler to reallocate bandwidth, at the 
expense of the second enhancement layer, to improve the CLR for the first enhancement 


layer at network loads below 0.8 and maintain the required service for the base layer. 
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Figure VI.18: Throughput Variation with Partial GOB Dropping. 


Comparing Figure VI.16 with Figure VI.17, service deferrals actually produce 
slightly better overall service, as demonstrated by lower CLR for the base layer and 
higher connection throughput, for network loads that maintain the base layer CLR below 
the target CLR. As network load increases, forgoing service deferrals results in better 
service to the base layer in terms of reduced CLR. These results suggest that the most 
effective scheduling scheme js actually a hybrid of the two approaches: use service 
deferrals when the base layer is receiving its required QoS and drop service deferrals 
when the base layer is not receiving its required QoS. 

The final issue examined is how the cells with related GOBs are arranged within 
the cell flow. Each base layer GOB has two associated enhancement layer GOBs. The 
partial GOB dropping algorithm discards upper layer cells whenever a base layer cell is 
discarded. However, the number of cells actually discarded depends on how the cells 
from the individual layers are arranged, concatenated or interleaved in a manner that 
reflects the actual spatial dependency among the cells in the different layers as shown in 
Figure VI.9. The goal is to minimize information loss by only dropping those upper layer 


cells that are rendered unusable by a drop in the base layer. 
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To examine this idea, the bit allocation among the layers was increased to 4:2:2. 
While this is the same relative ratio considered earlier, each layer’s GOB is now doubled 
in size to increase the effect of partial GOB dropping. Two arrangements were 
considered as shown in Figure VI.19. The first arrangement concatenates cells from each 
layer. The second arrangement interleaves the cells. In either case, base layer GOB 


headers occur every eight cells. 





ATM Cell Flow 
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Figure VI.19: Cell Arrangements Considered for a 4:2:2 Bit Allocation: a) 


Concatenated or b) Interleaved. 


The effect of each cell arrangement using bandwidth sharing and service deferrals 
is shown in Figure VI.20. For the base and first enhancement layers, interleaving cells 
from different layer GOBs improves CLR consistently regardless of the network load. 
Not surprisingly, the improved CLR comes at the expense of higher CLR for the second 
enhancement layer over the range of network loads examined. However, performance is 
judged unacceptable since, although a clear differentiation in service exists for each layer, 
QoS degrades for each layer at approximately the same rate instead of favoring the base 


layer at the higher network loads. Throughput differences for each case were negligible. 
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Figure VI.20: Relative Affect of Interleaving and Concatenating on CLR with 


Bandwidth Sharing and Service Deferrals. 


The effect of each cell arrangement using bandwidth sharing and service deferrals 


is shown in Figure VI.21. Interleaving gives a similar performance benefit to the one 


discussed in the last paragraph. CLRs are improved for both the base and first 


enhancement layers. There are two notable distinctions between concatenating and 


interleaving. As observed previously, forgoing service deferrals allow the scheduler to 


maintain the requisite QoS for the base layer. By concatenating or interleaving, the same 


is observed on Figure VI.21. However, interleaving still improves CLR by a small 


measure at each network load examined. For the first enhancement layer, unlike previous 


simulations, interleaving allows QoS to be maintained up to a network load of 0.8 


although it increases rapidly beyond this point. Also notable is the observation that 


interleaving improves the CLR for the second enhancement layer up to network loads 
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exceeding 0.8 although performance degrades relative to concatenation after this point. 


Again, throughput differences for each case were negligible. 
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Figure VJ.21: Relative Affect of Interleaving and Concatenating on CLR with 
Bandwidth Sharing and Without Service Deferrals. 


This chapter presented a scheduling algorithm for layered video traffic based on 
the STEBR algorithm originally proposed by Uziel [39]. The STEBR algorithm provides 
optimal scheduling for heterogeneous traffic, where each connection possibly has 
different CLR and CTD requirements. The hierarchical nature of layered video is 
introduced through a prioritization scheme that denies service to cells from lower priority 
layers during periods of congestion, thereby increasing the probability that cells from 
higher priority layers are transmitted. In this manner, the quality of the reconstructed 


video degrades in a graceful manner than if cells were dropped indiscriminately from the 
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connection. Effective throughput 1s increased through partial GOB dropping which also 
drops cells determined to be unusable to the decoder. Dropping these cells increases 
scheduling opportunities for viable cells and increases the probability of transmitting 


higher priority GOBs intact. 


189 





Vil. CONCLUSIONS 


A. SUMMARY OF WORK 


Motivated in part by the US Navy’s IT-21 initiative, there has been considerable 
interest in deploying multimedia applications over tactical networks. Tactical networks 
may be characterized as low bit rate, unreliable, and heterogeneous. Multimedia 
applications, especially those incorporating video, tend to be bandwidth intensive and 
sensitive to transmission errors. Traditional multimedia processing techniques do not 
take these constraints into account. 

This work investigated issues related to distributing low-bit-rate video within the 
context of a teleconferencing application deployed over a tactical ATM network. The 
main objective was to develop mechanisms that support transmission of low-bit-rate 
video streams as a Series of scalable layers that progressively improve quality. These 
mechanisms exploit the hierarchical nature of the layered video stream along the 
transmission path from the sender to the recipients to facilitate transmission. 
Specifically, the approach proposed in this dissertation works across the application- 
network interface by coding the video stream into layers, shaping the resulting layered 
video stream prior to entry into the network, and prioritizing service in accordance with 
the relative perceptual importance of each layer. 

A new layered video coding scheme was developed that includes a number of 
original contributions. This work codified some of the design issues required for an 
effective layered coder. How to layer the video stream effectively is an elementary design 
issue. To address this, a series of heuristic rules were proposed that lead to effective 
layering structures for motion video via wavelet-based subband decomposition. These 
rules stem from a simple split-and-merge algorithm that uses subband variance as a 
measure of perceptual relevance. By grouping subbands of like variance and assigning 
subbands to layers in order of perceptual importance, the video stream is divided into the 


requisite number of layers. We applied this heuristic rule set and devised a three-layer 
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coding scheme for low-motion video. Employing a common layering scheme for both 
motion video and static presentation slides yielded poor results due to their different 
energy distributions among the subbands and differing perceptual weighting of high 
frequency content. Consequently, we devised a separate scheme in which each layer 
incorporates contributions from all frequency bands. 

A new suboptimal rate control scheme for layered video was developed. Using 
classical rate-distortion approaches, constraining the bit rate for a layered video stream 
using k quantizers involves simultaneously solving k cost functions. In this work, a 
simpler approach replaced the k-dimensional rate-distortion problem with a one- 
dimensional operational rate-distortion curve generated from a set of suboptimal 
quantizer vectors. Rate control is then implemented via a table lookup into a codebook 
containing the suboptimal quantizer vectors. 

The effect of traffic smoothing, prior to network entry, on queuing performance 
and scheduling efficiency was examined. The approach investigated smoothing at three 
time scales: frame, layer, and cell interarrival. Smoothing at the frame level is performed 
by the rate controller and requires no special implementation. Smoothing within the 
frame is accomplished using a leaky-bucket mechanism whose token rate changes each 
frame. Implementations were proposed for transmitting layers over a single VCI and 
multiple VCIs as well as the implications of positioning the leaky bucket prior to the 
ATM layer. 

The problem of prioritizing cell scheduling in layered video traffic was 
investigated to enable a more graceful degradation in received video quality during 
periods of high cell loss. QoS at the connection level is maintained using the STEBR 
algorithm originally proposed by Uziel [39]. Within the connection, a prioritization 
scheme denies service to cells from lower priority layers as required to maintain the 
requisite QoS, in terms of cell loss rate, for higher priority layers are transmitted. This 
ensures that reconstructed video quality degrades more gracefully than if cells were 
dropped indiscriminately from the connection. Since the decoder resynchronizes using 


GOB headers following data loss, a cell dropped within a GOB renders any remaining 
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cells in the GOB unusable. We proposed partial GOB dropping to increase effective 
throughput by intelligently discarding related cells deemed unusable that would otherwise 
compete for and waste scheduling opportunities. 

Scheduling at the layer level, in addition to the connection level, requires a means 
for associating cells with layers. Also, partial GOB dropping requires the scheduler to 
have the ability to identify GOB headers within each layer. Two approaches were 
considered. The first approach assigns each layer to a separate VCC using AALS. This 
approach is the simplest in terms of implementation but requires increased signaling in 
multicast scenarios. The second approach multiplexes each layer across a single VCC 
using AAL2. This approach offers quicker call establishment and minimizes signaling in 
multicast scenarios but requires modification to the CPCS sublayer and does not scale 


beyond four layers. 
B. SUGGESTIONS FOR FUTURE RESEARCH 


The coder as proposed in Chapter IV supports only 8-bit grayscale video. 
Extension to 24-bit color video is a natural step in the maturation of the coder design. 
Video capture usually results in three bit planes, a luminous plane and two color 
difference planes, each with the same resolution as the original frame. Since the HVS is 
more sensitive to variations in luminosity than color, the color planes are normally 
subsampled relative to the luminous plane [6]. With 4:2:2 subsampling, each 16x16 
macroblock in the original frame is represented as a 16x16 luminance macroblock and 
two 8x8 color difference macroblocks. The work presented in Chapter IV applies only to 
the luminance portion of the picture. More research is required to investigate a general 
layering structure for the color difference components. While the frequency 
characteristics of the color components might be expected to mirror those of the 
luminance components, the perceptual importance of those components clearly does not. 
In the quantization matrix suggested for the color components by the JPEG standard, 


little discrimination is made between low and high frequencies, between vertical and 
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horizontal detail [66]. Whether a separate approach is required for the color content of 
static slides also bears consideration. 

One area not fully exploited by the proposed coder is recent advancements in 
entropy coding. One promising area of research is the concept of reversible codes, i.e., 
codes that are uniquely decipherable by parsing forward or backward through the 
bitstream. With a reversible code, the decoder would respond to a stream interruption by 
buffering the bitstream until the next GOB header is located. Then the decoder could 
parse backwards to recover a portion of the corrupted GOB. An interesting analysis 
could focus on the relative benefits of reversible coding and partial GOB dropping since 
the two approaches could not coexist. 

Other issues concerning the coder design that were only partially investigated 
include rate control at the macroblock level and error concealment schemes at the 
decoder. The results presented in Chapter IV only incorporate rate control at the frame 
level in which the quantizer vector 1s changed solely at the beginning of each new frame. 
Tighter control is possible by implementing rate control at the macroblock level and 
allowing the quantizer vector to change within the frame. The issue is whether changes 
to the quantizer vector within the frame would be distinctly perceptible. The final coder 
issue is implementing error concealment at the decoder. The decoder may use error 
concealment to compensate for incomplete information when reconstructing a frame. A 
simple but effective technique implemented here is zeroth order error concealment. If the 
decoder cannot determine if a macroblock should have been updated, the corresponding 
macroblock in the last frame is used by default. This is particularly effective with low 
motion video. More aggressive approaches to consider would employ prediction or 
interpolation to estimate missing coefficients from adjacent macroblocks. 

The MMRP model appears quite effective at representing VBR video, and the 
associated queuing analysis tools are mature. However, the approach recommended by 
Skelly et al. [14] uniformly quantizes the video stream. Experimentally determined 
histograms demonstrate that video, regardless of the motion content, is decidedly non- 


uniform in distribution [27]. Since MMRP queuing techniques stem from an estimate of 


% 
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the bit rate distribution, an accurate representation of this distribution is essential. Given 
that video is not distributed uniformly, non-uniform quantization schemes bear 
examination to improve the representation for a given number of states. One approach is 
to use Max-Lloyd quantizers [6], or an optimal representation could be developed 
directly from the original histogram. 

The STEBR algorithm provides a powerful, optimal scheduling algorithm for 
CBR and VBR real-time traffic with constraints on CLR and CTD. Two extensions 
appear worth further consideration to extend the algorithm. First, the STEBR algorithm 
makes scheduling decisions based on the past history of each connection and the current 
queue state assuming that no further arrivals take place during the current scheduling slot. 
A possible extension is to modify the cost function to consider the impact of predicted 
near-term arrivals for each connection. Predicting future arrivals requires that the 
scheduler maintain a suitable traffic model for each connection or an aggregate of related 
connections. Modeling bursty sources appears difficult in the context of real-time 
scheduling decisions, as opposed to buffer sizing, but predicting the behavior of 
multiplexed traffic, as typified by the approach taken for VBR video in [95], may prove 
feasible. 

Another worthwhile extension to STEBR 1s to incorporate the UBR and ABR 
service categories to create a uniform optimal scheduling algorithm. As STEBR is cost- 
based, extension requires construction of a cost-function suitable for each service 
category. For example, UBR connections can be assigned a permanent cost of one, thus 
restricting service unless all other connections are receiving their required QoS. Such an 
assignment appears suitable since UBR connections are assigned unutilized bandwidth 
from CBR and VBR connections. A suitable cost function for ABR is the ratio of MCR 
to instantaneous cell rate granted by the scheduler. However, ABR throughput benefits 
from employing feedback to regulate the sender’s transmission rate both to match 
available bandwidth and to fairly share available bandwidth among all the active ABR 
connections. A scheme for incorporating these mechanisms into the STEBR algorithm 


requires additional consideration. 
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APPENDIX A. OPNET MODEL CODE 


This appendix contains the OPNET process models used to generate the 
simulations results shown in Chapters V and VI. Each process model consists of a finite 
state machine and a series of code segments that implement the behavior required for 


each State. 
A. LAYERED VIDEO SCHEDULER 


The OPNET model for the layered scheduler implements the layered scheduling 
algorithm discussed in Chapter VI. Specifically, STEBR is used to select the winning 
connection at the beginning of each service interval, and the CLPRs for each layers are 
filtered and compared to determine the winning layer. The code also implements partial 
GOB dropping as an option. The scheduler assumes that each source transmits using 
only a single VCC (see Figure If.12). While the code is specifically tailored for the 
homogeneous traffic case, the model is easily extended to heterogeneous traffic by 
storing the connection type with the connection’s VCC and performing QoS filtering if 
the connection is carrying layered video. The finite state machine is shown in Figure 


A.1. 
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Figure A.1: Finite State Machine for Scheduler Process Model. 


Ie Header Block 


#include “ams _pk_support.h" 
#include <math.h> 


#define QUEUE_EMPTY (op_q_empty ()) 

#define SVC_COMPLETION op_intrpt_type () == OPC_INTRPT_SELF 
#define ARRIVAL op_intrpt_type () == OPC_INTRPT_STRM 
#define VCI_BASE LUO 

#define MAX SOURCE 7 

#define MAX LAYER 2 

#define DROP 1 

#define RETAIN 0 

#define NEWHEADER ab 


void order_queue (int) ; 
int expire _cells (void) ; 


Z State Variable Block 


ie \server_busy; 

double \service_rate; 

Objid \own_id; 

Stathandle \clp_handle; 

int \cell_count [MAX_SOURCE] ; 

arte \layerCellCount [MAX SOURCE] [MAX_LAYER] ; 
int \cells_dropped [MAX_SOURCE] ; 
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\layerCellsDropped [MAX_SOURCE]} [MAX_LAYER] ; 
Nel: 

\pk_svc_time; 

\cell_handle; 

\cells_serviced; 

\util_handle; 

\maxCTD; 

\maxCLP; 

\cells_waiting [MAX_SOURCE]; 
\gobDrop [MAX_SOURCE] [MAX_LAYER]; 
\clprO_handle; 

\clprl_hanaile- 

\clpr2_handle; 


Temporary Variable Block 


pkptr; 
insert_ok; 
num_cells; 


oe, aX 
source_id; 
layer_id; 


AtmT_Cell_Header_Fields* 
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double 
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atm_hdr_ptr; 
total_arrived; 
total_dropped; 
Teno en 
cell_to_send; 
CLP; 

clpr [MAX_SOURCE]; 
delta [MAX SOURCE] ; 
Max —e¢lpr, : 
winner; 
winningLayer; 


extra_cells [MAX_SOURCE] ; 
cost [MAX_SOURCE] ; 
service_slot; 
slotSourcelID; 

cell_cost; 

q_index; 

slot; 

max_index; 

max_cost; 

done; 


1LayerCLP; 
layerCLPR[MAX_ SOURCE] [MAX _LAYER}; 
layerCellsWaiting [MAX_SOURCE] [MAX LAYER}; 
filteredCost (MAX_LAYER]; 
filteredCLPR[MAX_LAYER] ; 
max_CLPR; 
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4. Function Block 


expire_cells() removes cell from the queue that have expired or as required by the 
partial GOB dropping algorithm. With partial GOB dropping, a flag indicates the status 
for each layer within a connection. An expired cell toggles the flag to “DROP”. If the 
expired cell belongs to the base layer, flags are set to “DROP” for each of the other 
layers. GOB headers reset the flags to “RETAIN”. Partial GOB dropping may be 


disabled by commenting out the lines highlighted in bold. 


int expire _cells(void) 
{ 
ime num cells, 
int source_1d; 
int layered; 
int gobHeader; 
arise 
Packet* pkptr; 
AtmT Cell Header_Fields* atm_hdr_ptr; 


/* Find the number of cells in the queue. */ 
num cells = op_subq_stat(0, OPC_QSTAT_PKSIZE) ; 


/* Remove cells that cannot complete service before expiring, 
Stare ligedescnie */ 

/* tail of the queue. */ 

dO 

Whi leweee=an num cells )4 
pkptr = op_subq_pk_access(0, 1x); 
op_pk_nfd_get(pkptr, “header fields", &atm_hdr_ptr) ; 
source_id = (atm_hdr_ptr->VCI - VCI_BASE); 
layer_id = atm_hdr_ptr->PT + 2*atm_hdr_ptr->CLP; 
gobHeader = atm_hdr_ptr->GFC; 


if (gobDrop[source_1d] == DROP) { 
if (gobHeader == NEWHEADER) { 
if ((maxCTD - op_q_wait_time(pkptr)) < pk _svc_time) { 
pkptr = op_subq_pk_remove(0, 1x); 
Gp _ pKEGeSeroy (KDE); 
op_prg_mem_free(atm_hdr_ptr) ; 
cells _dropped[source_1id]++; 
layerCellsDropped[source_id] [layer_id]++; 
num_cells--; 
cells _waiting[source_id]--; 
} 
else{ 
TE. (layer 1d —=26)4 
gobDrop[source_id][0] = RETAIN; 
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gobbDzep (source _id) [1] = RETAIN: 
qobDrop (source 1d]{2] = RETAIN; 
} 
elsef{ 
1£ (gobDrop[source_id][0] == RETAIN) { 
gobDrop[source_id] [layer_id] = RETAIN; 


} 
} 
/* Reload the header field struct. */ 
op_pk_nfd_set(pkptr, "header 
fields",atm_hdr_ptr,op_prg_mem_copy_create, \ 
op_prg_mem_free, sizeof (AtmT_Cell_Header_Fields) ); 
nef 
} 
} 
else{ 
pkptr = op_subgq_pk_remove(0, 1x); 
Op DK destroy (pkptr) 
op_prg_mem_free(atm_hdr_ptr); 
cells_dropped[source_1id] ++; 
layerCellsDropped[source_id] [layer_id] ++; 
num_cells--; 
cells waiting[source_id]--; 
} 
} 
else if ((maxCTD - op_q_wait_time(pkptr)) < pk_svc_time) { 
pkptr = op_subq_pk_remove(0, ix); 
op_pk_destroy(pkptr) ; 
op_prg_mem_free(atm_hdr_ptr) ; 
cells _dropped[source_1d]++; 
layerCellsDropped[source_id] [layer_id] ++; 
num_cells--; 
cells waiting [source_1d]--; 
gobDrop [source_id] [layer_id] = DROP; 
if (layer_id == 0O){ 
gobDrop [source_id] [1] 
gobDrop [source_id] [2] 


DROP ; 
DROP; 


} 
} 
else{ 
/* Reload the header field struct. */ 
op_pk_nfd_set (pkptr, "header 
fields",atm_hdr_ptr,op_prg_mem_copy_create, \ 
op_prg_mem_free, sizeof (AtmT_Cell_ Header Fields) ) ; 
iX++; 
} 
}//while 


return num_cells; 
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order_queue() reorders the queue in order of increasing ToE from the head of the 


queue. 


void order_queuve(int num_cells) 


{ 


double *ToE; 
double temp; 

a Ghieel X elo, 

int sorted; 
Packet* pkptr; 


/* Allocate memory for array consisting of ToE entries. */ 
ToE = (double*) op_prg_mem_alloc(num_cells*sizeof (double) ); 


/* Parse the queue and determine each cell’s ToE. */ 


for (ix = 0;ix < num_cells;ix+t+) { 
pkptr = op_subq_pk_access(0, 1x); 
ToE[ix] = maxCTD - op_q_wait_time(pkptr) ; 


} 


J* OGeue 1s Originally unsorted, */ 
sorted = OPC_FALSE; 


/* Perform a bubble sort. */ 
for (ix = 0; ! (sorted) && 1x < (num_cells - 1); ixt+){ 
sorted = OPC _ TRUE; 


for (jx = 0;jx < (mum_cells - ix - 1);4x++) { 
we (Ton (| jx >) Tor la<. en 
emp — Tor (axe 
TOE (gee) = LOn | active 
TOE [jx+1]} = temp; 
op_subq_pk_swap(0,jx,jx+1); 
sorted = OPC_FALSE; 


} 


/* Free the memory. */ 
op_prg_mem_free(ToE) ; 


5. Init State 


The Init State initializes all statistics and counters and sets the QoS parameters 


required for each connection. Since only homogenous traffic is considered, only a single 


set of parameters is listed. 
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/* initially the server is idle */ 
server_busy = 0; 


/* get queue module’s own object id */ 
own ia = op_id_self (); 


/* get assigned value of server processing rate */ 
op_ima_obj_attr_get (own_id, "Service_rate", &Service_rate) ; 


pk_svc_time = 1.0 / service_rate; 


/* Declare local statistics. */ 

clp_handle = op_stat_reg("CLP",OPC_STAT_INDEX NONE, OPC_STAT_LOCAL) ; 
cell _ handle = op_stat_reg("Time",OPC_STAT_INDEX_NONE, OPC_STAT_LOCAL) ; 
util_handle = 
op_stat_reg("Utilization",OPC_STAT_INDEX_NONE,OPC_STAT_LOCAL) ; 
clpr0O_handle = op_stat_reg("CLPRO",OPC_STAT_INDEX_NONE, OPC_STAT_LOCAL) ; 
clpri_handle = op_stat_reg("CLPR1" , OPC_STAT_INDEX_NONE, OPC_STAT_LOCAL) ; 
clpr2_handle = op_stat_reg("CLPR2",OPC_STAT_INDEX_NONE, OPC_STAT_LOCAL) ; 


for (ix=0; ix < MAX_SOURCE;1x++) { 

cell _count[ix] = 0 

cells_dropped[1ix] 0; 

eelis*waitting ix] = 0: 

gobDrop[ix] = RETAIN; 

for (jx=0;j3x < MAX_LAYER;jx++) { 
layerCellCount [ix] [jx] = 0; 
layerCellsbDropped [ix] [3x] = 0; 


Lees. 


} 
} 


ecells serviced = 0; 
op_stat_write(cell_handle, (double) cell_count[0]); 


/* Declare the QoS parameters. */ 
maxCTD = 0.050; 
mMaxCLP = 0.001; 


6. Arrival State 


The Arrival State acquires arriving cells and updates the connection statistics. 


Each cell arrival also triggers recording of the CLP QoS statistic. 


/* acquire the arriving packet * / 
/* multiple arriving streams are supported. */ 
Dkptr =Sep pk get (Opkintret_stim ()); 


/* Get the source ID from the VCI and increment arrival count for the 
source and layer. */ 

op_pk_nfd_get(pkptr, "header fields", &atm_hdr_ptr) ; 

source_id = (atm_hdr_ptr->VCI - VCI_BASE) ; 
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layer_id = atmekadr ptxr->PT + 2*atm hdr per—-eLe. 
cell_count [source_id] ++; 
layerCellCount [source_id] [layer_id] ++; 


/* Reload the header field struct. */ 

op_pk_nfd_set (pkptr, "header 

fields",atm_hdr_ptr,op_prg_mem_copy_create, \ 
op_prg_mem_free,sizeof (AtmT_Cell_ Header Fields) ); 


/* attempt to enqueue the packet at tail of subqueue 0 */ 

if (oplsuba pk insert (0, pkptr, OPC_OPOS TAIL) '= OPE ,OInseor, 
{ 
/* the insertion failed (due to a full queue) */ 
/* deallocate the packet */ 
op_pk_destroy (pkptr); 
cells_dropped[source_1d]++; 
layerCellsDropped[source_id] [layer_id] ++; 


/* set flag indicating insertion fail */ 
/* this flag is used to determine transition */ 
/* out of this state */ 

insert_ok = 0; 


} 
else { 
/* insertion was successful */ 
insert ok = 1; 


cells waiting [source_id] ++; 


} 


// Capture connection statistics. 

total_arrived = 0; 

total_dropped = 0; 

for (ix=0; ix < MAX SOURCE; i1x++) { 
total_arrived += cell_count[1ix]; 
total _dropped += cells_dropped[ix]; 

} 

clip = (double) total dropped) teotalwarriveca, 


LE “(epesi mMesrtme (j=) 0 a0) { 
op_stat_write(clp_handle,clp); 
} 


if (op_sim_time() > 0.0){ 
op_stat_write(cell_handle, ( (double) total_arrived) /op_sim_time()); 


} 


if (layerCellCount[1][0] > 0){ 
op_stat_write(clpr0O_handle, ( (double) layerCellsDropped[1] [0])/layerCcellc 
ount[(1j[0]); . 

} 

if (layercellCount[{ij] [1] > 0){ 
op_stat_write(clpri_handle, ( (double) layerCellsDropped[1][{1])/layerCellc 
ount Tt Ila 
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fom lavercellCount([i][2] > 0) { 
op_stat_write(clpr2_handle, ( (double) layerCellsDropped[1] [2])/layerCell1c 
eume ti ii2)); 

} 


qe SVC_Start State 


The SVC_Start state determines which cell to process after removing expired 
cells and discarding cells from corrupted GOBs. STEBR determines the winning 
connection and the winning layer is determined after cost filtering. Service deferral is 
optional. Code segments highlighted in bold text indicate where cost-filtering algorithm 


can be altered and where service deferral may be activated. 


/* In this state, at least one cell may require service. Find the 
number of cells. */ 
num_cells = expire_cells(); 


/* Sort the queue in descending order of ToE from the tail of the 
queue. */ 

if (nmum_cells >0) { 

order_queue(num_cells) ; 


} 


/* Update the CLP ratios and the delta cost. */ 
for (ix=0; ix < MAX SOURCE; 1ix++) { 


Teke = 100) 

delta[ix] = 0.0; 

ie (Cel lee lade acc lue= 0 pat 
1GiLP =) ( (double) cells dropped [ix] celiaicouneiacs 
deltalixiv= 120 / (cell “count [is] 3+ maxclue)- 

} 

clprf{ix] = iCLP/maxCLP; 


/* Update the layer statistics. */ 

for (3% = 0; 5% — MAX TAVERS {xX++) 1 
iLayerCLP = 0.0; 
if (layerCellCount [ix] [jx] > 0){ 

iLayerCLP = 
((double) layerCellsDropped[ix] [jx]) /layerCellCount [1x] [jx]; 

} 
layerCLPR[ix] [jx] = 1LayerCLP/maxCLP; 


/* Initialize the connection cost panewe xe rawee) ll eeomnt = amen, 
for (ix=0; ix < MAX SOURCE;1x++) { 
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extra. cells ocean Oo. 
costal <] = —120;- 


/* Initialize the current layer count. */ 

for (jx = 0;53x < MAX_LAYER;jx++) { 
layerCellsWaiting[ix] [jx] = 0; 

} 


/* Determine the current service slots and cost of each cell in the 


queue. */ 

if (num_cells > 0){ 
cell_cost = (double*) op_prg_mem_alloc(num_cells*sizeof (double) ) ; 
Service Slot — (int) Op pro mem_al loc (numece! ls*sizeortint) 
slotSourceID = (int) gopsprto memallecimumeceliis *eizeof (iat) 

} 

for (ix = 0:;ix < num cells: ix+4) { 
DKpE = "Gp suba pKeaccess (G71) 
Servicers lot ix eatin) floor ((maxcip — 


Oop_q_wait_time(pkptr))/pk_svc_time) ; 
op_pk_nfd_get(pkptr, “header fields", &atm_hdr_ptr) ; 
source_id = atm_hdr_ptr->VCI - VCI_BASE; 
layer_id = atm_hdr_ptr->PT + 2*atm_hdr_ptr->CLP; 
slotSourceiD(ix] = source _id;- 
layerCellsWaiting[(source_id] (layer_id] ++; 
op_pk_nfd_set(pkptr, "header 

fields",atm_hdr_ptr,op_prg_mem_copy_create, \ 

op_prg_mem_free, sizeof (AtmT_Cell_Header_ Fields) ) ; 
clpr({source_id] += delta[source_id]; 
Gel lweost [axle 4c lor source. 1c); 


} 
// STEBR starts here! 


/* Grant service! */ 
Pe (niece l ls S90) 


/* Work from tail of queue forward to head. */ 
q_index = num_cells - 1; 
done = OPC_FALSE; 


for (slot = service_slot [num_cells-1]; (slot = 0) cadena = 
OPC_TRUE) ;slot--) { ° 


/* Examine cells in the current time slot. */ 

while ((q_index >= 0) && (service_slot[(q_index] == slot)) { 
source_id = slotSourceID[q_index] ; 
layer_id = slotLayerID[(q_index] ; 


/* Cost out the source mt/ 


Lf (cost [Source=i14d]7>—207)7 
extra_cells[source_id] ++; 
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Gey, 


} 
else{ 
cost [source_id] = cell_cost[q_index]; 


i 


q_index--; 


} 


/* Determine which connection is granted service in current slot 
dase Only om, COnneceloOn COStS. 


Max cost = cost [0]; 
max_index = 0; 
for (ix = 1;ix < MAX SOURCE; ixt+t) { 
if (COoSE | isc eeemaxmeost ) { 
Meaxscost = cost |i 
max index = 15, 


} 
/* Assign the source to this slot if there are cells available. 


if (cost[max_index] >= 0) { 
winner = max_index; 


// Source has only one cell in the interval. 


Gf (extra cells (max _index] == 90)7 
cost[max_index] = -1l; 
} 
// Source has more than one cell in the interval. 
else{ 
extra_cells [max_index] --; 
cost [max_index] = cost[max_index] - delta[max_index]; 


// Load the layer costs. 
fom euix = 0; oe MAX LAYER tx) 
filteredCost[{ix] = layerCost [winner] [ix]; 


} 

} 

else if (q_index < 0) { 
done = OPC_TRUE; 

} 


| ieee 


/* Locate a cell from the winning source. */ 


Prune the costs of the winning source. */ 


for (jx = 1;4jx < MAX_LAYER; jx++) { 


if (layerCellisWaiting[winner][jx] == 0)% 
layerCLPR[winner] {jx] = 0; 


20% 


} 


/* Find the winning layer from the source. <*/ 
for (ix = 0;ix < MAX_LAYER;ix+t) { 

filteredCLPR[ix] = layerCLPR[winner] [ix]; 
} 


/* Filter the CLPR’s to emphasize lower layers. */ 
if (filteredCLPR[0] > 1.0){ 
filteredCLPR[i] = 0.0; 
filteredCLPR[{2] = 0.0; 
} 
else if (filteredCLPR[1] > 1.0){ 
filteredCLPR[2] = 0.0; 
} 


/* Pick the layer with highest CLPR. */ 
winningLayer = 0; 
max_CLPR = filteredCLPR[0]; 
for (ix = 1;ix < MAX_LAYER;ix++) { 
if (£ilteredCLPR[ix] > max_CLPR) { 
winningLayer = 1x; 
max_CLPR = filteredCLPR[ix]; 


} 


cell_to_send = 0; 
for (ix = 0;ix < num_cells;ix++) { 
1f ((slotSourceID[ix] == winner) && (slotLayerID[ix] == 


winningLayer) ) { 


/* 


ah 


cell_to_send = 1x; 
break; 


} 


//Activate service deferral here. 


if (service_slot[cell_to_send] > 2){ 


for (ix = O;ix < num_cells;ix++) { 
if (slotSourceID[ix] == winner) { 
cell to_send = ix; 
break; 
} 
} 


// Bubble the cell to head of the queue. 
if (cell “to send >= 0)4 
for (ix = cell_to_send;ix > 0;ix--) { 
op_subq_pk_swap(0,1x,1x-1); 
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// Grant service to the cell. 
op_intrpt_schedule_self(op_sim_time() + pk_svc_time, 0) ; 


// The server is now busy. 
server_busy = 1; 


/* Free memory. */ 

if (num_cells >0){ 
op_prg_mem_free(cell_cost) ; 
op_prg_mem_free(service_slot) ; 
op_prg_mem_free(slotSourcelID) ; 


8. SVC_Complete State 


The SVC_Complete State removes a packet from the queue that has finished transmission. 


/* Cell at the head of the queue */ 
fois yust Eilnishing service */ 
pkptr = op_subq_pk_remove (0, OPC_QPOS_HEAD) ; 


/* Update the source cells waiting count. */ 
op_pk_nfd_get(pkptr, "header fields", &atm_hdr_ptr) ; 
source_id = (atm_hdr_ptr->VCI - VCI_BASE); 
op_prg_mem_free(atm_hdr_ptr) ; 
cells_waiting[source_id]--; 


/* forward the packet on stream 0, */ 
/* causing an immediate interrupt at dest. */ 
op. pk send forced (pkptr,. 0). 


/* server is idle again. */ 
server_busy = 0; 


B. LAYERED VIDEO SOURCE 


The layered video process model represents up to N layered video source using a 
six-state MMRP with a deterministic arrival process. Cells from each layer of a 
particular source are multiplexed over a single VCI. Therefore, each cell is tagged using 
the scheme shown in Table II.3 to identify its parent layer. The finite state machine is 


shown in Figure A.2. 
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Figure A.2: Finite State Machine for a Layered Video Traffic Model. 


i, Header Block 


#include "“ams_pk_Support.h" 


#define SEND_CELL 0 
#define CHANGE_STATE 100 
#define MAX SOURCE 7 
#define MAX LAYER 3 


#define NEW_STATE ((op_intrpt_type() == OPC_INTRPT SELF) &&\ 
(op_intrpt_code() >= CHANGE _STATEB) ) 


#define NEW_CELL ((op_intrpt_type() == OPC_INTRPT_SELF) &&\ 
((op_intrpt_code() >= SEND_CELL) &&\ 
(Op _InErpt._code(je <=. (SENDZCELE, + 
(MAX SOURCE+1)*10)))) 


#define INF elec oe 
#define VCI_BASE 100 
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/* Event code = {Event type} {Source ID}{Layer ID} for 3 decimal digits. 
mar 

/* Cells are tagged by VCI = VCI + Source_ID 

* / 

/* Originating layer is indicating by the SDU and CLP bits. 

a. 


AtmT_Cell_Header_Fields* set_header(int,int) ; 


2. State Variable Block 


Opa id Nselt id; 
Bera te \curr_state [MAX_SOURCE] ; 
sels! \next_state [MAX_SOURCE] ; 
Dieser bution « 

\State dist; 
double \transit_time; 
double \interval; 


Stathandle \state0O_shandle; 
Stathandle \statel_shandle; 
Stathandle \rate_shandle; 


oc \sources; 
Evhandle Vcell_introt (Max SOURCE]: 
Bias \layer_state [MAX_SOURCE] ; 


3. Temporary Variable Block 


double M(6)] (6) = {{0.000)1. 8077026367 0.153,/0 02s, 0 G00), \ 
{1.240707 00 0. 28 Gade 399,0.044 07022); \ 
{5 366.7 70-8337 070007 0-167,0.0007 0 000R 
{2.800,3.920,0-280,0.000,0.000, 0. OGeaa. 
1720007 00007 020007 02000 702000702000 jm 
{000,007 -0007.0-4000 , 0. 000805000)20..000).). 
double Tambdarol) = (isZimec, 232-5, 052.07 ,49e-90, Soe eo e eo 


Packet* cell ptr; 
is @ aad ealge rae 

eT te aes er 

itt source id; 
imecess 1 on: 


int layer_id; 


AtmT_Cell_Header_Fields* atm_hdr_ptr; 


4. Function Block 


set_header() creates an ATM cell header structure with the appropriate SDU- and 
CLP-bit tags for the layer. 


Ziel 


AtmT_Cell_Header_Fields* set_header(int source_id,int layer_id) 


{ 
AtmT_Cell_Header_Fields* atm_hdr_ptr; 


// Allocate memory for header fields. 

atm_hdr_ptr = 
(AtmT_Cell_Header_Fields*)op_prg_mem_alloc(sizeof (AtmT_Cell_Header_Fiel 
as) ); 


// Load the VCI. 
atm hdr per->VCl = Ven BASE + source iG, 


// Identify the layer. 
switch (layer_1idqd) { 
case Q: 
atm_hdr_ptr->PT = 0; 
atm_hdr_ptr->CLP = Q; 
break; 
case l: 
atm_hdr_ptr->PT = 1; 
acm Nar pter=-CEP — 70; 
break; 
case 2: 
atm_hdr_ptr->PT = 0; 
atm_hdr_ptr=->CLP = 1; 
break; 


} 


Tetum vatm Norse, 


5. Init State 


The Init State creates an array of exponential distributions to represents transitions 
between states in the MMRP model. Each source 1s started arbitrarily in state 0. 
/* get source module’s own object id */ 
Selt mia = Op 10 _celi{); 


/* get the requested number of multiplexed video sources */ 
op_ima_obj_attr_get (self_id, "Number_of_Sources", &sources) ; 


/* allocate space and load distributions */ 
State dist. = | 
(Distribucion**) (op_prg_mem_alloc(sizeof(Distributrten \-26 


for (ix=0;1x<6;ix++) { 


for (jx=0;3x<6;j3x++t) { 
aioe ol UC br so. I|llew pe.) a= CMM Oy a 
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State Gist ([1x*6+jx] = 
Some teteloda| exponential” ,1.0/M[ix] [3x] ,0); 
} 
else{ 
state_dist [ix*6+jx] = op_dist_load("“exponential",INF,0O); 
} 


} 


/* generate an initial interupt for each source, arbitrarily */ 
/* choosing the Oth state. * / 
for (ix = 0;ix < sources;ix+t) { 
next_state[1x] = 0; 
op_intrpt_schedule_self(op_sim_time(),CHANGE_STATE + 10*1ix); 


6. Transition State 


The Transition State reflects that a source is transitioning between states in the 
MMRP model. The time until the next transition is determined. The arrival rate for that 


source 1s updated to reflect the current state. 


/* One of the sources is changing state; get the source’s id. */ 
session_id = op_intrpt_code() - CHANGE_STATE; 
Source 1a = session _1d/10: 


/* Cancel the pending cell transmission self-interupt for this source. 
* / i 
Peontep evo valsa(cel) imerpe [Source 2ajjom 

op_ev_cancel (cell_intrpt {[source_id]); 


} 


/* Assign the new current state. */ 
curr_state[source_id] = next_state[source_id] ; 


/*® Find next state and transition time */ 
next_state[source_id] = 0; 
transit_time = op_dist_outcome(state_dist [curr_state[source_id]*6]); 


J/* Search for the shortest time, ehrs 1s the mex sear 
Eon (ix = Lixs< 6-1x+e 


interval = op_dist_outcome(state_dist[curr_state[source_id]*6 + 
asc); 


Tfetincerval =< transit cime) 4 


transit_time = interval; 
next_state[source_1d] = 1x; 
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/* Reset the layer state counter. */ 
layer_state[source_1id] = 0; 
layer_id = layer_state[source_id]; 


/* Create a new formatted ATM cell. */ 
cell_ptr = op_pk_create_fmt ("ams_atm_cell"); 


/* Allocate memory for the header and assign fields. */ 
atm_hdr_ptr = set_header(source_id,layer_1id) ; 


/* ID the first cellecor asCGOs.-/ 
atm_hdr_ptr->GFC = 1; 


/* Load the ATM header and transmit the cell. */ 
op_pk_nfd_set(cell_ptr, "header 
fields",atm_hdr_ptr,op_prg_mem_copy_create, \ 

op_prg_mem_free,sizeof(AtmT_Cell_Header_Fields)); 
op. pk_senditesl lo ptr, 0) ; 
cell _ intrpt [source_id] = op_intrpt_schedule_self(op sim time() “+ 
1.0/lambda[curr_state[source_1id]],\ 

SEND_CELL + 10*source_id); 


/* Schedule state transition */ 
op_intrpt_schedule_self(op_sim_time() + transit_time,CHANGE STATE + 
10*source_1iq) ; 


a Send_cell State 


The Send_cell State transmits a new cell and schedules the next departure using 
the current arrival rate. In addition, the state determines the identity of the layer sending 
the cell. Bit allocation among layers, each layer’s GOB length, and the manner of 
interleaving are all set here. 

/* One of the sources is changing state; get the source’s id. */ 
Se€SSi10MN 2d = op _antrpt code() - SEND CELL; 


source_id = session_1id/10; 


/* Determine the layer id. */ 


layer_state[source_id] = (layer_state[source_id]++); 
lf (layer sstatelsource.1a) > 7/4 
layer_state[source_id] = 0; 


} 
switch(layer_state[source_1id] ) { 
case 0: case 1: case 2: case 3: 
layer_id = 0; 
break; 
case 4: case 5: 
layer_id = 1; 
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break; 
case 6: case 7: 
haveretao= 2; 
break; 
} 


J* Create ana send an Unitormattea cell.  */ 
cell_ptr = op_pk_create_fmt("ams_atm_cell"); 


/* Allocate memory for the header and assign fields. */ 
atm_hdr_ptr = set_header(source_id, layer_1id); 


PietD the first cell of a GOB 7 
if (layer_state[source_1id] == 0) { 
atm_hdr_ptr->GFC = 1; 
} 
else{ 
atm_hdr_ptr->GFC = 0; 
} 


/* Load the ATM header and transmit the cell. */ 
op_pk_nfd_set(cell_ptr, "header 
fields",atm_hdr_ptr,op_prg_mem_copy_create, \ 

op_prg_mem_free,sizeof (AtmT_Cell_ Header Fields) ); 
op_pk_send(cell_ptr,0); 


/* Schedule next cell departure. */ 
cell_intrpt[source_id] = op_intrpt_schedule_self(op_sim_time() + 
1.0/lambda[curr_state[source_id]], \ 

SEND CELL + 10*source id) ; 


Zi 
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APPENDIX B. MMRP MODEL PARAMETERS 


The MMRP model parameters used in the OPNET simulations were developed 
using the procedure outlined in Section V.B. In accordance with the discussion presented 
in [90], the rate-controlled video trace shown in Figure B.1] was quantized to six levels. 
Table B.1 gives the state distribution vector calculated for the case of six states and the 
associated state arrival rates. Table B.2 gives the associated infinitesimal generating 


function for a frame rate of 10 fps. 





® 
e 
© 
LL 
g 
= 
a 
0 100 200 300 400 500 600 700 
Frame Number 
Figure B.1: Rate-controiled VBR Video Sequence. 
State i ] Z eB 4 5 6 
TC; 0.4136 0.4806 0.0618 0.0379 0.0045 0.0015 
hi (cps) 132702 23289 332.87 432.90 ae .97 687205 


Table B.1: State Probabilities and Arrival Rates for Quantized Video Source. 
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—2.6218 1.8073 0.6364 0.1527 0.0255 0 
1.2405 —-1.9937 0.2880 0.3987 0.0443 0.0222 


_| 5.6667 = 0.8333 — 6.6667 = 0.1667 0 0 
2.8000 3.9200 0.2800 -—7.0000 0 0 
7.0000 0 0 0 — 7.0000 0 

0 7.0000 0 0 0 — Fea 


Table B.2: Infinitesimal Generating Function for Quantized Video Source. 


For this sequence, six states give excellent results. Figure B.2 demonstrates how 
closely the MMRP captures the histogram of the original source. Mean bit rate is 
overpredicted but within 1% of the actual mean bit rate. Figure B.3 displays the 
autocorrelation function of both the model and the sequence, illustrating a close match 
over a period of 30 seconds. Figure B.3 also includes the model autocorrelation function 
when seven States are used. The closeness in tracking the autocorrelation function 
depends on how accurately the model predicts the mean bitrate. For this sequence, using 
7 states gives a worse Overprediction of the mean bitrate, thereby leading to the bias 
displayed in tracking the autocorrelation function. Increasing the number of states did 


not guarantee better results until a prohibitively large number of states were employed. 


Probability 





1 1.5 2 2.5 3 3.5 4 4.5 5 55 6 
State Number 


Figure B.2: Predicted and Actual Histograms. 
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Figure B.3: Actual and Predicted Autocorrelation Functions. 
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